bigben446 2022-02-03 21:50 采纳率: 85.7%
浏览 182
已结题

shell脚本提取txt文件中关键词并统计出现次数

有一个test.txt文件,里面想提取特定关键词,并对各项关键词统计出现过几次,使用shell脚本。

关键词的特征是:**SRA.关键词.**,开头“SRA.”+关键词+结尾“.”

SRA:SRR10168379.12205864.1关键词是:SRR10168379
SRA:SRR10168392.8392060.2关键词是:SRR10168392

希望运行完成的结果res.txt,关键词——号后面加上次数,按照次数多少排序:
SRR10168379——7次
SRR10168392——2次

test.txt文件内容是

# tblastn
# Iteration: 0
# Query: 
# RID: ZRD35BAF013
# Database: SRR10168375 SRR10168376 SRR10168377 SRR10168378 SRR10168379 SRR10168381 SRR10168392 SRR10168393 SRR13285085 SRR13285570
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, % positives, query/sbjct frames
# 100 hits found
Query_45991    SRA:SRR10168379.12205864.1    34.694    49    32    0    527    575    148    2    4.6    36.6    53.06    0/-3
Query_45991    SRA:SRR10168379.10841544.1    41.667    48    20    1    187    226    144    1    5.7    36.2    58.33    0/-1
Query_45991    SRA:SRR10168392.8392060.2    47.059    34    13    1    194    222    27    128    11    35.4    61.76    0/3
Query_45991    SRA:SRR10168393.9810230.1    41.304    46    19    1    187    224    1    138    15    35.0    58.70    0/1
Query_45991    SRA:SRR10168379.2460949.2    41.304    46    19    1    187    224    1    138    18    34.7    58.70    0/1
Query_45991    SRA:SRR10168393.20965295.2    42.222    45    18    1    188    224    1    135    20    34.7    57.78    0/1
Query_45991    SRA:SRR10168376.8708660.2    43.902    41    15    1    192    224    1    123    28    34.3    58.54    0/1
Query_45991    SRA:SRR10168379.12533534.1    40.000    50    22    1    187    228    150    1    31    34.3    56.00    0/-1
Query_45991    SRA:SRR10168379.6639135.2    41.304    46    19    1    187    224    141    4    34    33.9    58.70    0/-1
Query_45991    SRA:SRR10168379.13010027.2    39.583    48    21    1    187    226    1    144    41    33.9    58.33    0/1
Query_45991    SRA:SRR10168381.1806861.1    40.816    49    21    1    188    228    150    4    41    33.9    55.10    0/-1
Query_45991    SRA:SRR10168379.3721520.2    40.816    49    21    1    188    228    150    4    41    33.9    55.10    0/-1
Query_45991    SRA:SRR10168378.17299083.1    41.304    46    19    1    187    224    139    2    42    33.9    56.52    0/-3
Query_45991    SRA:SRR10168393.17810994.2    39.583    48    21    1    187    226    3    146    46    33.5    58.33    0/3
Query_45991    SRA:SRR10168379.2656880.1    41.304    46    19    1    187    224    144    7    53    33.5    58.70    0/-1
Query_45991    SRA:SRR10168379.1997738.2    41.304    46    19    1    187    224    146    9    53    33.5    58.70    0/-2
Query_45991    SRA:SRR10168379.11604415.1    41.304    46    19    1    187    224    149    12    55    33.5    56.52    0/-2
Query_45991    SRA:SRR10168379.11899618.1    39.583    48    21    1    187    226    147    4    57    33.5    56.25    0/-1
Query_45991    SRA:SRR10168379.4610022.2    39.583    48    21    1    187    226    147    4    57    33.5    56.25    0/-1


  • 写回答

4条回答 默认 最新

  • _GX_ 2022-02-03 22:48
    关注
    $ cat test.awk
    !/^[ \t]*#/ {
        keyword = substr($2, 5, 11);
        count[keyword]++;
    } END {
        for (keyword in count)
            print keyword "---" count[keyword];
    }
    $ cat test.txt
    # tblastn
    # Iteration: 0
    # Query: 
    # RID: ZRD35BAF013
    # Database: SRR10168375 SRR10168376 SRR10168377 SRR10168378 SRR10168379 SRR10168381 SRR10168392 SRR10168393 SRR13285085 SRR13285570
    # Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, % positives, query/sbjct frames
    # 100 hits found
    Query_45991    SRA:SRR10168379.12205864.1    34.694    49    32    0    527    575    148    2    4.6    36.6    53.06    0/-3
    Query_45991    SRA:SRR10168379.10841544.1    41.667    48    20    1    187    226    144    1    5.7    36.2    58.33    0/-1
    Query_45991    SRA:SRR10168392.8392060.2    47.059    34    13    1    194    222    27    128    11    35.4    61.76    0/3
    Query_45991    SRA:SRR10168393.9810230.1    41.304    46    19    1    187    224    1    138    15    35.0    58.70    0/1
    Query_45991    SRA:SRR10168379.2460949.2    41.304    46    19    1    187    224    1    138    18    34.7    58.70    0/1
    Query_45991    SRA:SRR10168393.20965295.2    42.222    45    18    1    188    224    1    135    20    34.7    57.78    0/1
    Query_45991    SRA:SRR10168376.8708660.2    43.902    41    15    1    192    224    1    123    28    34.3    58.54    0/1
    Query_45991    SRA:SRR10168379.12533534.1    40.000    50    22    1    187    228    150    1    31    34.3    56.00    0/-1
    Query_45991    SRA:SRR10168379.6639135.2    41.304    46    19    1    187    224    141    4    34    33.9    58.70    0/-1
    Query_45991    SRA:SRR10168379.13010027.2    39.583    48    21    1    187    226    1    144    41    33.9    58.33    0/1
    Query_45991    SRA:SRR10168381.1806861.1    40.816    49    21    1    188    228    150    4    41    33.9    55.10    0/-1
    Query_45991    SRA:SRR10168379.3721520.2    40.816    49    21    1    188    228    150    4    41    33.9    55.10    0/-1
    Query_45991    SRA:SRR10168378.17299083.1    41.304    46    19    1    187    224    139    2    42    33.9    56.52    0/-3
    Query_45991    SRA:SRR10168393.17810994.2    39.583    48    21    1    187    226    3    146    46    33.5    58.33    0/3
    Query_45991    SRA:SRR10168379.2656880.1    41.304    46    19    1    187    224    144    7    53    33.5    58.70    0/-1
    Query_45991    SRA:SRR10168379.1997738.2    41.304    46    19    1    187    224    146    9    53    33.5    58.70    0/-2
    Query_45991    SRA:SRR10168379.11604415.1    41.304    46    19    1    187    224    149    12    55    33.5    56.52    0/-2
    Query_45991    SRA:SRR10168379.11899618.1    39.583    48    21    1    187    226    147    4    57    33.5    56.25    0/-1
    Query_45991    SRA:SRR10168379.4610022.2    39.583    48    21    1    187    226    147    4    57    33.5    56.25    0/-1
    $ cat test.txt | awk -f test.awk > res.txt
    $ cat res.txt
    SRR10168392---1
    SRR10168376---1
    SRR10168381---1
    SRR10168393---3
    SRR10168378---1
    SRR10168379---12
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

问题事件

  • 系统已结题 2月11日
  • 已采纳回答 2月3日
  • 已采纳回答 2月3日
  • 创建了问题 2月3日

悬赏问题

  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同
  • ¥50 如何openEuler 22.03上安装配置drbd
  • ¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
  • ¥15 无线连接树莓派,无法执行update,如何解决?(相关搜索:软件下载)
  • ¥15 Windows11, backspace, enter, space键失灵