douxun4173
2013-05-27 19:31
采纳率: 0%
浏览 56
已采纳

大数据:逐行或按CSV处理SQL插入/更新或合并最佳?

So basically I have a bunch of 1 Gig data files (compressed) with just text files containing JSON data with timestamps and other stuff.

I will be using PHP code to insert this data into MYSQL database.

I will not be able to store these text files in memory! Therefor I have to process each data-file line by line. To do this I am using stream_get_line().

  • Some of the data contained will be updates, some will be inserts.

Question Would it be faster to use Insert / Select / Update statements, or create a CSV file and import it that way?

Create a file thats a bulk operation and then execute it from sql?

I need to basically insert data with a primary key that doesnt exist, and update fields on data if the primary key does exist. But I will be doing this in LARGE Quantities.

Performance is always and issue.

Update The table has 22,000 Columns, and only say 10-20 of them do not contain 0.

图片转代码服务由CSDN问答提供 功能建议

所以基本上我有一堆1 Gig数据文件(压缩),只有包含带时间戳的JSON数据的文本文件和 其他的东西。

我将使用PHP代码将这些数据插入到MYSQL数据库中。

我将无法将这些文本文件存储在内存中! 因此,我必须逐行处理每个数据文件。 要做到这一点,我使用的是stream_get_line()。

  • 包含的一些数据将是更新,有些将是插入。

    问题 如果使用插入/选择/更新语句或创建CSV文件并以此方式导入它会更快吗?

    创建一个批量操作的文件,然后从sql执行它?

    我需要基本上插入不存在的主键数据,如果主键有数据则更新数据字段 存在。 但我将在大量工作中这样做。

    性能始终是问题。

    更新 表格有 22,000列,只说10-20列不包含0.

  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • duanchifo2866 2013-05-27 20:01
    已采纳

    I would load all of the data to a temporary table and let mysql do the heavy lifting.

    1. create the temporary table by doing create table temp_table as select * from live_table where 1=0;

    2. Read the file and create a data product that is compatible for loading with load data infile.

    3. Load the data into the temporary table and add an index for your primary key

    4. Next Isolate you updates by doing a inner query between the live table and the temporary table. walk through and do your updates.

    5. remove all of your updates from the temporary (again using an inner join between it and the live table).

    6. process all of the inserts with a simple insert into live_table as select * from temp_table.

    7. drop the temporary table, go home and have a frosty beverage.

    This may be over simplified for your use case but with a little tweaking it should work a treat.

    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题