douxun4173 2013-05-27 19:31 采纳率: 0%
浏览 57
已采纳

大数据:逐行或按CSV处理SQL插入/更新或合并最佳?

So basically I have a bunch of 1 Gig data files (compressed) with just text files containing JSON data with timestamps and other stuff.

I will be using PHP code to insert this data into MYSQL database.

I will not be able to store these text files in memory! Therefor I have to process each data-file line by line. To do this I am using stream_get_line().

  • Some of the data contained will be updates, some will be inserts.

Question Would it be faster to use Insert / Select / Update statements, or create a CSV file and import it that way?

Create a file thats a bulk operation and then execute it from sql?

I need to basically insert data with a primary key that doesnt exist, and update fields on data if the primary key does exist. But I will be doing this in LARGE Quantities.

Performance is always and issue.

Update The table has 22,000 Columns, and only say 10-20 of them do not contain 0.

  • 写回答

1条回答 默认 最新

  • duanchifo2866 2013-05-27 20:01
    关注

    I would load all of the data to a temporary table and let mysql do the heavy lifting.

    1. create the temporary table by doing create table temp_table as select * from live_table where 1=0;

    2. Read the file and create a data product that is compatible for loading with load data infile.

    3. Load the data into the temporary table and add an index for your primary key

    4. Next Isolate you updates by doing a inner query between the live table and the temporary table. walk through and do your updates.

    5. remove all of your updates from the temporary (again using an inner join between it and the live table).

    6. process all of the inserts with a simple insert into live_table as select * from temp_table.

    7. drop the temporary table, go home and have a frosty beverage.

    This may be over simplified for your use case but with a little tweaking it should work a treat.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)