yaycici 2014-07-22 09:03
浏览 3301

join:file is not in sorted order

linux下使用join关联两个文件
两文件1和2内容如下:
[c7@db1 shconfig]$ cat 1
1|1|目3|的|0
3|1|Test|dd|0
4|1|jjjj|dd|0
6|1|等发达|大水|0
7|1|3组| |0
10|1|sd|好的|0
11|2|恩呢| |2

[c7@db1 shconfig]$ cat 2
35|2|1|Jul 10 2014 4:03:52.000000PM|1|67.0|66.0|1|test|2|
37|1|1|Jul 10 2014 2:56:34.000000PM|26|200.0|30.0|1|dddd|0|
39|1|3|Jul 10 2014 2:57:03.000000PM|6|100.0|50.0|1|aaaaa|1|
40|1|3|Jul 10 2014 2:57:03.000000PM|6|60.0|10.0|1|aaaaa|1|
34|1|10|Jul 10 2014 1:23:01.000000PM|1|00.0|60.0|1|lllll|2|

希望经过关联文件1的第一列和文件2的第三列之后的结果是这样的:
1|1|目3|的|0|35|2|Jul 10 2014 4:03:52.000000PM|1|67.0|66.0|1|test|2|
1|1|目3|的|0|37|1|Jul 10 2014 2:56:34.000000PM|26|200.0|30.0|1|dddd|0|
3|1|Test|dd|0|39|1|Jul 10 2014 2:57:03.000000PM|6|100.0|50.0|1|aaaaa|1|
3|1|Test|dd|0|40|1|Jul 10 2014 2:57:03.000000PM|6|60.0|10.0|1|aaaaa|1|
10|1|sd|好的|0|34|1|Jul 10 2014 1:23:01.000000PM|1|00.0|60.0|1|lllll|2|

但是对文件1和2进行排序后以join关联结果如下:
[c7@db1 shconfig]$ sort -k1 -t'|' 1 > 3
[c7@db1 shconfig]$ sort -k3 -t'|' 2 > 4
[c7@db1 shconfig]$ join -j1 1 -j2 3 -t'|' 3 4
10|1|sd|好的|0|34|1|Jul 10 2014 1:23:01.000000PM|1|00.0|60.0|1|lllll|2|
join: file 1 is not in sorted order
3|1|Test|dd|0|39|1|Jul 10 2014 2:57:03.000000PM|6|100.0|50.0|1|aaaaa|1|
3|1|Test|dd|0|40|1|Jul 10 2014 2:57:03.000000PM|6|60.0|10.0|1|aaaaa|1|
附:[c7@db1 shconfig]$ cat 3
10|1|sd|好的|0
11|2|恩呢| |2
1|1|目3|的|0
3|1|Test|dd|0
4|1|jjjj|dd|0
6|1|等发达|大水|0
7|1|3组| |0
[c7@db1 shconfig]$ cat 4
34|1|10|Jul 10 2014 1:23:01.000000PM|1|00.0|60.0|1|lllll|2|
37|1|1|Jul 10 2014 2:56:34.000000PM|26|200.0|30.0|1|dddd|0|
35|2|1|Jul 10 2014 4:03:52.000000PM|1|67.0|66.0|1|test|2|
39|1|3|Jul 10 2014 2:57:03.000000PM|6|100.0|50.0|1|aaaaa|1|
40|1|3|Jul 10 2014 2:57:03.000000PM|6|60.0|10.0|1|aaaaa|1|

再用以数字排序后进行关联的方式:
[c7@db1 shconfig]$ sort -n -k1 -t'|' 1 > 5
[c7@db1 shconfig]$ sort -n -k3 -t'|' 2 > 6
[c7@db1 shconfig]$ join -j1 1 -j2 3 -t'|' 5 6
1|1|目3|的|0|35|2|Jul 10 2014 4:03:52.000000PM|1|67.0|66.0|1|test|2|
1|1|目3|的|0|37|1|Jul 10 2014 2:56:34.000000PM|26|200.0|30.0|1|dddd|0|
3|1|Test|dd|0|39|1|Jul 10 2014 2:57:03.000000PM|6|100.0|50.0|1|aaaaa|1|
3|1|Test|dd|0|40|1|Jul 10 2014 2:57:03.000000PM|6|60.0|10.0|1|aaaaa|1|
join: file 1 is not in sorted order

附:
[c7@db1 shconfig]$ cat 5
1|1|目3|的|0
3|1|Test|dd|0
4|1|jjjj|dd|0
6|1|等发达|大水|0
7|1|3组| |0
10|1|sd|好的|0
11|2|恩呢| |2
[c7@db1 shconfig]$ cat 6
35|2|1|Jul 10 2014 4:03:52.000000PM|1|67.0|66.0|1|test|2|
37|1|1|Jul 10 2014 2:56:34.000000PM|26|200.0|30.0|1|dddd|0|
39|1|3|Jul 10 2014 2:57:03.000000PM|6|100.0|50.0|1|aaaaa|1|
40|1|3|Jul 10 2014 2:57:03.000000PM|6|60.0|10.0|1|aaaaa|1|
34|1|10|Jul 10 2014 1:23:01.000000PM|1|00.0|60.0|1|lllll|2|

实在搞不懂为什么,如果大家有解决办法请务必告知,谢谢。。

附:版本及LANG
[c7@db1 shconfig]$ sort --version
sort (GNU coreutils) 8.4
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.
[c7@db1 shconfig]$ join --version
join (GNU coreutils) 8.4
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel.
[c7@db1 shconfig]$ echo $LANG
C

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 基于卷积神经网络的声纹识别
    • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
    • ¥100 为什么这个恒流源电路不能恒流?
    • ¥15 有偿求跨组件数据流路径图
    • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
    • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
    • ¥15 CSAPPattacklab
    • ¥15 一直显示正在等待HID—ISP
    • ¥15 Python turtle 画图
    • ¥15 stm32开发clion时遇到的编译问题