dragon_9000 2015-11-14 22:27
浏览 24

比较定界文件以进行数据库更新

Any suggestions on packages (or methodologies) that might help with this? I need to take a ~40MB file we receive weekly and determine what changed from the previous to the current file. Whatever those changes are, then need to be made to a single simple database table. In a previous life I've accomplished similar via Linux "diff" with -Hae parameters, resulting in an "ed script". The contents were then handled by a PERL program, using Tie::File to reference the changed record in the previous file. In an effort to strengthen my Go skills I'm trying to utilize it for this current task. https://github.com/sergi/go-diff looks like it might be the ticket, but I'm not sure "patch" output will quite do what I need (easily).

Fixed width and/or delimited text files are still commonly used, does anyone have any samples or pointers or suggestions on packages that might help in dealing with them in this way?

  • 写回答

1条回答 默认 最新

  • dpj96988 2015-11-20 20:46
    关注

    Are you sure you need the intermediate step? 40 MB is not very much, and your database engine is specialized in handling data like that..

    For instance with postgresql just load the new data into a temporary table:

    create table temptable (
     a varchar,
     b varchar,
     c varchar
    );
    copy temptable from '/path/to/csv/newdata.txt' delimiter ',' csv;
    

    Then you can use simple SQL query to get the lines that do not have exact match in the old data, for example:

    select *
    from temptable t
    where not exists (
     select 1
     from oldtable o
     where t.a=o.a and t.b=o.b and t.c=o.c
    )
    

    If you did not save the data from previous week's batch, then just remember to copy it to an other table for storing. Now the real question is what you want to do with the information, but you should be able to handle most scenarios.

    评论

报告相同问题?

悬赏问题

  • ¥15 mmocr的训练错误,结果全为0
  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀