bigben446 2022-02-03 21:50 采纳率: 85.7%
浏览 180
已结题

shell脚本提取txt文件中关键词并统计出现次数

有一个test.txt文件,里面想提取特定关键词,并对各项关键词统计出现过几次,使用shell脚本。

关键词的特征是:**SRA.关键词.**,开头“SRA.”+关键词+结尾“.”

SRA:SRR10168379.12205864.1关键词是:SRR10168379
SRA:SRR10168392.8392060.2关键词是:SRR10168392

希望运行完成的结果res.txt,关键词——号后面加上次数,按照次数多少排序:
SRR10168379——7次
SRR10168392——2次

test.txt文件内容是

# tblastn
# Iteration: 0
# Query: 
# RID: ZRD35BAF013
# Database: SRR10168375 SRR10168376 SRR10168377 SRR10168378 SRR10168379 SRR10168381 SRR10168392 SRR10168393 SRR13285085 SRR13285570
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, % positives, query/sbjct frames
# 100 hits found
Query_45991    SRA:SRR10168379.12205864.1    34.694    49    32    0    527    575    148    2    4.6    36.6    53.06    0/-3
Query_45991    SRA:SRR10168379.10841544.1    41.667    48    20    1    187    226    144    1    5.7    36.2    58.33    0/-1
Query_45991    SRA:SRR10168392.8392060.2    47.059    34    13    1    194    222    27    128    11    35.4    61.76    0/3
Query_45991    SRA:SRR10168393.9810230.1    41.304    46    19    1    187    224    1    138    15    35.0    58.70    0/1
Query_45991    SRA:SRR10168379.2460949.2    41.304    46    19    1    187    224    1    138    18    34.7    58.70    0/1
Query_45991    SRA:SRR10168393.20965295.2    42.222    45    18    1    188    224    1    135    20    34.7    57.78    0/1
Query_45991    SRA:SRR10168376.8708660.2    43.902    41    15    1    192    224    1    123    28    34.3    58.54    0/1
Query_45991    SRA:SRR10168379.12533534.1    40.000    50    22    1    187    228    150    1    31    34.3    56.00    0/-1
Query_45991    SRA:SRR10168379.6639135.2    41.304    46    19    1    187    224    141    4    34    33.9    58.70    0/-1
Query_45991    SRA:SRR10168379.13010027.2    39.583    48    21    1    187    226    1    144    41    33.9    58.33    0/1
Query_45991    SRA:SRR10168381.1806861.1    40.816    49    21    1    188    228    150    4    41    33.9    55.10    0/-1
Query_45991    SRA:SRR10168379.3721520.2    40.816    49    21    1    188    228    150    4    41    33.9    55.10    0/-1
Query_45991    SRA:SRR10168378.17299083.1    41.304    46    19    1    187    224    139    2    42    33.9    56.52    0/-3
Query_45991    SRA:SRR10168393.17810994.2    39.583    48    21    1    187    226    3    146    46    33.5    58.33    0/3
Query_45991    SRA:SRR10168379.2656880.1    41.304    46    19    1    187    224    144    7    53    33.5    58.70    0/-1
Query_45991    SRA:SRR10168379.1997738.2    41.304    46    19    1    187    224    146    9    53    33.5    58.70    0/-2
Query_45991    SRA:SRR10168379.11604415.1    41.304    46    19    1    187    224    149    12    55    33.5    56.52    0/-2
Query_45991    SRA:SRR10168379.11899618.1    39.583    48    21    1    187    226    147    4    57    33.5    56.25    0/-1
Query_45991    SRA:SRR10168379.4610022.2    39.583    48    21    1    187    226    147    4    57    33.5    56.25    0/-1


  • 写回答

4条回答 默认 最新

  • _GX_ 2022-02-03 22:48
    关注
    $ cat test.awk
    !/^[ \t]*#/ {
        keyword = substr($2, 5, 11);
        count[keyword]++;
    } END {
        for (keyword in count)
            print keyword "---" count[keyword];
    }
    $ cat test.txt
    # tblastn
    # Iteration: 0
    # Query: 
    # RID: ZRD35BAF013
    # Database: SRR10168375 SRR10168376 SRR10168377 SRR10168378 SRR10168379 SRR10168381 SRR10168392 SRR10168393 SRR13285085 SRR13285570
    # Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, % positives, query/sbjct frames
    # 100 hits found
    Query_45991    SRA:SRR10168379.12205864.1    34.694    49    32    0    527    575    148    2    4.6    36.6    53.06    0/-3
    Query_45991    SRA:SRR10168379.10841544.1    41.667    48    20    1    187    226    144    1    5.7    36.2    58.33    0/-1
    Query_45991    SRA:SRR10168392.8392060.2    47.059    34    13    1    194    222    27    128    11    35.4    61.76    0/3
    Query_45991    SRA:SRR10168393.9810230.1    41.304    46    19    1    187    224    1    138    15    35.0    58.70    0/1
    Query_45991    SRA:SRR10168379.2460949.2    41.304    46    19    1    187    224    1    138    18    34.7    58.70    0/1
    Query_45991    SRA:SRR10168393.20965295.2    42.222    45    18    1    188    224    1    135    20    34.7    57.78    0/1
    Query_45991    SRA:SRR10168376.8708660.2    43.902    41    15    1    192    224    1    123    28    34.3    58.54    0/1
    Query_45991    SRA:SRR10168379.12533534.1    40.000    50    22    1    187    228    150    1    31    34.3    56.00    0/-1
    Query_45991    SRA:SRR10168379.6639135.2    41.304    46    19    1    187    224    141    4    34    33.9    58.70    0/-1
    Query_45991    SRA:SRR10168379.13010027.2    39.583    48    21    1    187    226    1    144    41    33.9    58.33    0/1
    Query_45991    SRA:SRR10168381.1806861.1    40.816    49    21    1    188    228    150    4    41    33.9    55.10    0/-1
    Query_45991    SRA:SRR10168379.3721520.2    40.816    49    21    1    188    228    150    4    41    33.9    55.10    0/-1
    Query_45991    SRA:SRR10168378.17299083.1    41.304    46    19    1    187    224    139    2    42    33.9    56.52    0/-3
    Query_45991    SRA:SRR10168393.17810994.2    39.583    48    21    1    187    226    3    146    46    33.5    58.33    0/3
    Query_45991    SRA:SRR10168379.2656880.1    41.304    46    19    1    187    224    144    7    53    33.5    58.70    0/-1
    Query_45991    SRA:SRR10168379.1997738.2    41.304    46    19    1    187    224    146    9    53    33.5    58.70    0/-2
    Query_45991    SRA:SRR10168379.11604415.1    41.304    46    19    1    187    224    149    12    55    33.5    56.52    0/-2
    Query_45991    SRA:SRR10168379.11899618.1    39.583    48    21    1    187    226    147    4    57    33.5    56.25    0/-1
    Query_45991    SRA:SRR10168379.4610022.2    39.583    48    21    1    187    226    147    4    57    33.5    56.25    0/-1
    $ cat test.txt | awk -f test.awk > res.txt
    $ cat res.txt
    SRR10168392---1
    SRR10168376---1
    SRR10168381---1
    SRR10168393---3
    SRR10168378---1
    SRR10168379---12
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

问题事件

  • 系统已结题 2月11日
  • 已采纳回答 2月3日
  • 已采纳回答 2月3日
  • 创建了问题 2月3日

悬赏问题

  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效
  • ¥15 悬赏!微信开发者工具报错,求帮改
  • ¥20 wireshark抓不到vlan
  • ¥20 关于#stm32#的问题:需要指导自动酸碱滴定仪的原理图程序代码及仿真
  • ¥20 设计一款异域新娘的视频相亲软件需要哪些技术支持
  • ¥15 stata安慰剂检验作图但是真实值不出现在图上
  • ¥15 c程序不知道为什么得不到结果
  • ¥15 键盘指令混乱情况下的启动盘系统重装
  • ¥40 复杂的限制性的商函数处理