2015-08-16 19:48
浏览 141


Looking for insight on the best approach for large csv file imports to mysql and managing the dataset. This is for an ecommerce storefront "startup". All product data will be read from csv files which are download via curl (server to server).

Each csv file represents a different supplier/warehouse with up to 100,000 products. In total there are roughly 1.2 million products spread over 90-100 suppliers. At least 75% of the row data (51 columns) is redundant garbage and will not be needed.

Would it be better to use mysqli LOAD DATA LOCAL INFILE to 'temp_products' table. Then, make the needed data adjustments per row, then insert to the live 'products' table or simply use fgetcsv() and go row by row? The import will be handled by a CronJob using the sites php.ini with a memory limit of 128M.

  • Apache V2.2.29
  • PHP V5.4.43
  • MySQL V5.5.42-37.1-log
  • memory_limit 128M

I'm not looking for "How to's". I'm simply looking for the "best approach" from the communities perspective and experience.

图片转代码服务由CSDN问答提供 功能建议

寻找有关大型csv文件导入mysql和管理数据集的最佳方法的见解。 这是一个电子商务店面“启动”。 所有产品数据都将从csv文件中读取,这些文件通过curl(服务器到服务器)下载。

每个csv文件代表一个不同的供应商/仓库,最多100,000个产品。 总共有大约120万种产品分布在90-100家供应商中。 至少75%的行数据(51列)是冗余垃圾,不需要。

将mysqli LOAD DATA LOCAL INFILE用于'temp_products'表是否更好? 然后,每行进行所需的数据调整,然后插入实时'产品'表或简单地使用fgetcsv()并逐行进行? 导入将由CronJob使用站点php.ini处理,内存限制为128M。

  • Apache V2.2.29
  • PHP V5.4.43
  • MySQL V5.5.42-37.1-log
  • memory_limit 128M

    I 我不是在寻找“怎么样”。 我只是从社区的角度和经验中寻找“最佳方法”。

  • 写回答
  • 好问题 提建议
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • doujuanxun7167 2015-08-16 21:27

    I have direct experience of doing something virtually identical to what you describe -- lots of third party data sources in different formats all needing to go into a single master table.

    I needed to take different approaches for different data sources, because some were in XML, some in CSV, some large, some small, etc. For the large CSV ones, I did indeed follow roughly your suggested routed:

    • I used LOAD DATA INFILE to dump the raw contents into a temporary table.
    • I took the opportunity to transform or discard some of the data within this query; LOAD DATA INFILE allows some quite complex queries. This allowed me to use the same temp table for several of the import processes even though they had quite different CSV data, which made the next step easier.
    • I then used a set of secondary SQL queries to pull the temp data into the various main tables. All told, I had about seven steps to the process.

    I had a set of PHP classes to do the imports, which all implemented a common interface. This meant that I could have a common front-end program which could run any of the importers.

    Since a lot of the importers did similar tasks, I put the commonly used code in traits so that the code could be shared.

    Some thoughts based on the things you said in your question:

    • LOAD DATA INFILE will be orders of magnitude quicker than fgetcsv() with a PHP loop.
    • LOAD DATA INFILE queries can be very complex and achieve very good data mapping without ever having to run any other code, as long as the imported data is going into a single table.
    • Your memory limit is likely to need to be raised. However, using LOAD DATA INFILE means that it will be MySQL which will use the memory, not PHP, so the PHP limit won't come into play for that. 128M is still likely to be too low for you though. -If you struggle to import the whole thing in one go, try using some simple Linux shell commands to split the file into several smaller chunks. CSV data format should make that fairly simple.
    解决 无用
    打赏 举报

相关推荐 更多相似问题