dongzhong5756 2015-10-06 09:11
浏览 39
已采纳

PHP将JSON / CSV与SQL数据库匹配,进行了许多调整(cakePHP)

I want to insert a JSON file (also available as CSV) into a mySQL database using the cakePHP framework. The basics are clear, but the surrounding requirements make it difficult:

  1. The JSON/CSV file is large (approx. 200 MB and up to 200.000 lines).
  2. The file contains several fields. These fields need to be mapped to fields with different names in the mySQL database.
  3. The CSV contains a field named art_number. This field is also present in the mySQL database. The art_number is unique, but not the primary key in mySQL. I want to update the mySQL record if CSV and database have the same art_number. If not a new record should be created.
  4. Several fields of the CSV file need to be processed before they are stored. Also additional fields need to be added.
  5. The CSV contains an image_URL. If it is a NEW record (unknown art_number) to the database, this image should be copied, modified (with imagick) and stored on the server.
  6. The whole job needs to run on a daily basis.

As you can see there is a lot going on with some limitations (memory, runtime etc.). But I am not sure how to approach this from an architecture point of view. E.g. should I first try to insert everything into a seperate "import" database table and then run through the steps seperately? What is a good way to get the IDs from the database mapped to the CSV lines? Cakephp is able to perform either creating a new or updating an existing record if I am able to map the ID based on the art_number. Also changing and copying up to 200.000 images seems to be a big issue. So how to break this down into smaller chunks?

I would appreciate if you could help find the right strategy here. What do I need to consider in terms of memory and speed? Doe sit make sense to split the process into different jobs? What/how would oyu do that?

  • 写回答

1条回答 默认 最新

  • duanlan5320 2015-10-06 11:08
    关注

    I would appreciate if you could help find the right strategy here. What do I need to consider in terms of memory and speed?

    • Use a shell for the import
    • Read the data in chunks of X lines or X amount of data to avoid memory problems and then process these chunks. It's a simple loop.
    • If the processing is going to require a LONG time consider using a job queue like Resque. You can update the status of the progress to the user if needed.

    Doe sit make sense to split the process into different jobs? What/how would oyu do that?

    This depends on the requirements and how long your processing will take and how much your system can process in parallel without going up to 100% CPU usage and effectively slowing down the site. If this happens move the processing to another machine or limit the CPU usage for that process using the nice command.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?