PHP将JSON / CSV与SQL数据库匹配，进行了许多调整（cakePHP）

I want to insert a JSON file (also available as CSV) into a mySQL database using the cakePHP framework. The basics are clear, but the surrounding requirements make it difficult:

The JSON/CSV file is large (approx. 200 MB and up to 200.000 lines).
The file contains several fields. These fields need to be mapped to fields with different names in the mySQL database.
The CSV contains a field named art_number. This field is also present in the mySQL database. The art_number is unique, but not the primary key in mySQL. I want to update the mySQL record if CSV and database have the same art_number. If not a new record should be created.
Several fields of the CSV file need to be processed before they are stored. Also additional fields need to be added.
The CSV contains an image_URL. If it is a NEW record (unknown art_number) to the database, this image should be copied, modified (with imagick) and stored on the server.
The whole job needs to run on a daily basis.

As you can see there is a lot going on with some limitations (memory, runtime etc.). But I am not sure how to approach this from an architecture point of view. E.g. should I first try to insert everything into a seperate "import" database table and then run through the steps seperately? What is a good way to get the IDs from the database mapped to the CSV lines? Cakephp is able to perform either creating a new or updating an existing record if I am able to map the ID based on the art_number. Also changing and copying up to 200.000 images seems to be a big issue. So how to break this down into smaller chunks?

I would appreciate if you could help find the right strategy here. What do I need to consider in terms of memory and speed? Doe sit make sense to split the process into different jobs? What/how would oyu do that?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanlan5320 2015-10-06 11:08
关注
I would appreciate if you could help find the right strategy here. What do I need to consider in terms of memory and speed?

Use a shell for the import

Read the data in chunks of X lines or X amount of data to avoid memory problems and then process these chunks. It's a simple loop.

If the processing is going to require a LONG time consider using a job queue like Resque. You can update the status of the progress to the user if needed.

Doe sit make sense to split the process into different jobs? What/how would oyu do that?

This depends on the requirements and how long your processing will take and how much your system can process in parallel without going up to 100% CPU usage and effectively slowing down the site. If this happens move the processing to another machine or limit the CPU usage for that process using the nice command.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

PHP将JSON / CSV与SQL数据库匹配，进行了许多调整（cakePHP） mysql php
2015-10-06 09:11

回答 1 已采纳 I would appreciate if you could help find the right strategy here. What do I need to consider i
CakePHP中的多维数组=> CSV导出 php sql
2014-08-27 05:11

回答 1 已采纳 I wasn't able to find an easy way to get the field count in SQL, and as it turns out doing it in P
CakePHP 3.6.11：将选定值从下拉列表存储到数据库 php
2018-09-27 13:55

回答 1 已采纳 Just like I told you couple of minutes ago. Change the input like this: echo $this->Form->
php类库
2021-07-30 08:42

极梦网络无忧的博客依赖管理( Dependency ...pickle - PHP扩展安装器 Melody - A tool to build one file Composer scripts. 依赖注入( Dependency Injection ) 实现依赖注入设计模式的库 Pimple - 一个小的依赖注入容器 containe
将数据从下拉列表保存到数据库CakePHP html jquery php
2017-03-10 08:58

回答 1 已采纳 As i told you, your int field in the table cannot save the varchar of service name. So, you need t
CakePHP数据不保存/添加到数据库 php sql
2013-03-04 12:09

回答 3 已采纳 You've specified an incorrect request 'method'; if ($this->request->is('sm')) { Should be
Redis作为CakePHP 3中的NoSQL键/值数据库 php redis
2016-03-03 08:20

回答 1 已采纳 Take a close look at what that datasource does, it's pretty much just a proxy to a Redis class ins
CAKEPHP 快速入门
2019-02-19 15:05

u010338486的博客首先我们需要安装 CakePHP，然后创建数据库，最后编写一个简单的文章管理系统。基本要求：一个数据库服务器。此教程采用的是 MySQL 作为数据库。你需要储备基本的 SQL 知识。懂得如何使...
如何在CakePHP 3.x中发送JSON响应 php
2018-07-04 08:40

回答 1 已采纳 Please try this. withStringBody accepts a string only. // If you want a json response return $thi
在CakePHP中进行sql查询后访问数据 php
2016-11-21 20:33

回答 2 已采纳 If you need particular row only from table,try this: $user = $this->UserVerifications
使用Cakephp 3复杂的JSON json mysql php
2017-03-31 15:33

回答 1 已采纳 Thanks to user ndm, I realized there were a few problems with my approach. I thought having all d
必学PHP类库/常用PHP类库大全
2020-03-13 11:44

gws813539162的博客 [JingwenTian]awesome-php [ziadoz]awesome-php 依赖管理( Dependency Management ) 用于依赖管理的包和框架 Composer/Packagist- 一个包和依赖管理器. Composer Installers- 一个多框架Composer...
CakePHP中的结构JSON输出 json php
2014-08-27 11:32

回答 2 已采纳 Just use the Hash class before setting the results: function buildQuestion(){ $questions = $this
2021PHP经典面试题大全汇总（更新）
2021-04-09 09:26

CRMEB定制开发的博客 2021PHP经典面试题汇总，包括PHP基础部分、数据库部分、面向对象部分、ThinkPHP部分部分、smarty模板引擎、二次开发系统（DEDE、ecshop）、微信公众平台开发、对于自身掌握的技术描述等几部分PHP面试题。源码分享：...
php面试题大全及答案
2021-01-21 12:49

艾莉宝贝的博客 ## 包括PHP基础部分、数据库部分、面向对象部分、ThinkPHP部分部分、smarty模板引擎、二次开发系统（DEDE、ecshop）、微信公众平台开发、对于自身掌握的技术描述等几部分PHP面试题。 ** 1、PHP语言的一大优势是跨...
常用的100个PHP类库资源和技巧
2021-12-17 22:58

沈恩华的博客收集整理一些常用的PHP类库, 资源以及技巧. 以便在工作中迅速的查找所需，本文主要和大家分享常用的100个PHP 类库资源和技巧，希望能帮助到大家。 PHP相关的有参考价值的社区,博客,网站,文章,书籍,视频等资源 PHP...
PHP 笔试 + 面试题
2021-03-21 18:57

颜夕啊的博客 数据库技术题综合技术题项目及设计题 ** 基础及程序题 ** [1] 写一个排序算法，可以是冒泡排序或者是快速排序，假设待排序对象是一维数组（不能使用系统已有函数）（C/C++、PHP、Java）假设以下的排序都是...
常用php类库、资源
2020-06-11 11:37

逸之云的博客 PHP相关的有参考价值的社区,博客,网站,文章,书籍,视频等资源 PHP网站(PHP Websites) PHP The Right Way - 一个PHP实践的快速参考指导 PHP Best Practices - 一个PHP最佳实践 Clean Code PHP - 一个PHP 编码标准 ...
必学PHP类库/常用PHP类库大全，php 类库分类-收集
2019-07-12 08:48

weixin_30684743的博客依赖管理( Dependency Management ) 用于依赖管理的包和框架 Composer/Packagist- 一个包和依赖管理器. ...pickle- PHP扩展安装器 Melody- A tool to build one file Composer scripts. ...
优信php笔试题_PHP 笔试 + 面试题
2020-12-21 08:21

weixin_39913472的博客本章主要介绍常见的 PHP 笔试 + 面试题，包括：基础及程序题数据库技术题综合技术题项目及设计题基础及程序题[1] 写一个排序算法，可以是冒泡排序或者是快速排序，假设待排序对象是一维数组(不能使用系统已有函数)(C...
没有解决我的问题, 去提问

悬赏问题

¥15 c语言怎么用printf（“\b \b”）与getch（）实现黑框里写入与删除？
¥20 怎么用dlib库的算法识别小麦病虫害
¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
¥15 java写代码遇到问题，求帮助
¥15 uniapp uview http 如何实现统一的请求异常信息提示？
¥15 有了解d3和topogram.js库的吗？有偿请教
¥100 任意维数的K均值聚类
¥15 stamps做sbas-insar，时序沉降图怎么画
¥15 买了个传感器，根据商家发的代码和步骤使用但是代码报错了不会改，有没有人可以看看
¥15 关于#Java#的问题，如何解决？

PHP将JSON / CSV与SQL数据库匹配，进行了许多调整（cakePHP）

1条回答 默认 最新

悬赏问题

1条回答默认最新