PHP / MySQL - 每周更新7000万行

I have a database with just over 70 million rows in. This data was originally parsed and imported from roughly 70,000 XML files. These files are updated every week so I need to scan through these XML files (via a cron on Sundays at 2AM in the morning) and update rows that have changed/insert new rows.

$operatorSQL = "INSERT IGNORE INTO `operator` (`reference`, `national_operator_code`, `operator_code`, `operator_short_name`, `operator_name_on_license`, `trading_name`) VALUES (:reference, :nationalOperatorCode, :operatorCode, :operatorShortName, :operatorNameOnLicense, :tradingName);";
$serviceSQL = "INSERT IGNORE INTO `service` (`service_code`, `private_code`, `date_start`, `date_end`, `mode`, `description`, `origin`, `destination`) VALUES (:serviceCode, :privateCode, :dateStart, :dateEnd, :mode, :description, :origin, :destination);";
$serviceOperatorSQL = "INSERT IGNORE INTO `service_operator` (`service_code`, `operator_reference`) VALUES (:serviceCode, :operatorReference);";
$journeyPatternSQL = "INSERT IGNORE INTO `journey_pattern` (`reference`, `direction`, `destination_display`, `vehicle_type_code`, `vehicle_type_description`) VALUES (:reference, :direction, :destinationDisplay, :vehicleTypeCode, :vehicleTypeDescription);";
$journeyPatternRouteSQL = "INSERT IGNORE INTO `journey_pattern_route` (`journey_pattern_reference`, `route_reference`) VALUES (:reference, :routeReference);";
$journeyPatternSectionLink = "INSERT IGNORE INTO `journey_pattern_section_link` (`journey_pattern_reference`, `journey_pattern_section_reference`) VALUES (:reference, :journeyPatternSectionReference);";
$journeyPatternSectionSQL = "INSERT IGNORE INTO `journey_pattern_section` (`reference`) VALUES (:reference);";
$lineSQL = "INSERT IGNORE INTO `service_line` (`service_code`, `name`) VALUES (:serviceCode, :name);";
$timingLinkSQL = "INSERT IGNORE INTO `journey_pattern_timing_link` (`reference`, `stop_from`, `stop_from_timing`, `stop_from_sequence_number`, `stop_from_activity`, `stop_to`, `stop_to_timing`, `stop_to_sequence`, `stop_to_activity`, `run_time`, `direction`) VALUES (:reference, :stopFrom, :stopFromTiming, :stopFromSequenceNumber, :stopFromActivity, :stopTo, :stopToTiming, :stopToSequenceNumber, :stopToActivity, :runTime, :direction)";
$timingLinkJpsSQL = "INSERT INTO `journey_pattern_timing_link_jps` (`journey_pattern_timing_link`, `journey_pattern_section_reference`) VALUES (:linkReference, :sectionReference);";
$timingLinkRouteLinkRefSQL = "INSERT INTO `journey_pattern_timing_link_rlr` (`journey_pattern_timing_link`, `route_link_reference`) VALUES (:linkReference, :routeLinkReference);";
$routeSQL = "INSERT IGNORE INTO `route` (`reference`, `private_code`, `description`) VALUES (:reference, :privateCode, :description);";
$routeSectionSQL = "INSERT IGNORE INTO `route_section` (`reference`) VALUES (:reference);";
$routeLinkSQL = "INSERT IGNORE INTO `route_link` (`reference`, `stop_from`, `stop_to`, `direction`, `distance`) VALUES (:reference, :stopFrom, :stopTo, :direction, :distance);";
$routeLinkSectionSQL = "INSERT INTO `route_link_section` (`route_link_reference`, `route_section_reference`) VALUES (:routeLinkReference, :routeSectionReference);";
$vehicleJourneySQL = "INSERT IGNORE INTO `vehicle_journey` (`reference`, `private_code`, `departure`) VALUES (:reference, :privateCode, :departure);";
$vehicleJourneyServiceSQL = "INSERT IGNORE INTO `vehicle_journey_service` (`vehicle_journey_reference`, `service_reference`) VALUES (:reference, :serviceRef);";
$vehicleJourneyLineSQL = "INSERT IGNORE INTO `vehicle_journey_line` (`vehicle_journey_reference`, `service_line_reference`) VALUES (:reference, :lineRef);";
$vehicleJourneyJpSQL = "INSERT IGNORE INTO `vehicle_journey_jp` (`vehicle_journey_reference`, `journey_pattern_reference`) VALUES (:reference, :journeyPatternRef);";

Above are all of the SQL queries that are performed. You will notice that the IGNORE clause is used in the INSERT statement, this is just to make sure that if any files have duplicate data no errors will stop the script, instead it'll just ignore it and move on.

I don't feel this is the most efficient way of doing it however as when I run the script again after doing the initial insert of all the data it's just as slow as when the original inserts are executed... surely if 99.9% of the rows are the same it should skim through? Any ideas why this is happening?

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doufubu2518 2014-04-13 10:52
关注
Query optimisation is normally for select, update and delete queries. The fact that you are just inserting data into table(s) means there is no query optimisation to be done; the engine does not have to work out some complicated plan to shove that data into the tables. The speed at which it will do the insert is just a function of your CPU, hard-disk speed, I/O network bandwidth, amongst other factors. The data you are inserting is not being cached in any sense so if you do the inserts again, it will be done at the same rate.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

编辑

预览

报告相同问题？

关注问题

PHP / MySQL - 每周更新7000万行

2条回答 默认 最新

2条回答默认最新