duandaiqin6080 2016-03-28 20:27
浏览 66
已采纳

在Heroku应用程序上将大文件导入mysql

I need some help.

I have an php app on Heroku. In this app, there's a form that upload an csv file, to be imported on Mysql(cleardb).

The problem it's, that the file it's large (will always be large), and the function it's taking too much time to finish (about 90 seconds). The timeout on heroku it's 30 seconds, and there's no way to change that.

I tried to use Heroku Scheduler (like cron), but the minimal frequency it's 10 minutes, and a script that can take 90 seconds, using this scheduler, will take 30 minutes, because as i said, the timeout of heroku it's 30 seconds.

Well, what can i do? there's an alternative scheduler?

Example of the import:

CSV

name,productName,points,categoryName,coordName,date

MYSQL

[users]

userID
userName
categoryID
coordID

[products]

productID
productName

[coords]

coordID
coordName

[categories]

categoryID
categoryName

[points]

pointID
productID
userID
value

in all tables, i need to make a select to see if the category, coord, etc, already exists. If exists, return id, if not, insert a new line.

I dont think that there's a way to decrease time execution time. I'm trying to find a way to decrease the schedule to 2 minutes, 3 minutes, etc. So, in about 10 minutes, all lines will be imported.

thanks!

  • 写回答

1条回答 默认 最新

  • dqsa17330 2016-03-28 21:21
    关注

    This is what I would start with (because it's relatively simple/quick to implement and should give you a reference point and some wiggle room for further tests in a short period of time):

    Import all the data as-is into a temporary table (if the server's RAM allow you can also try the memory engine).
    Then, after the data has been imported, create the indices needed for the following queries (and check via EXPLAIN or any other tool that shows you if and how the indices are used):

    • query all the categories that are in the temporary table but not in your live data tables
      • create those categories in the live tables.
    • query all coords that are in the temporary table but not in your live data tables.
      • create those coords in the live tables.
    • you get the idea ...repeat for all necessary data.
    • then just import the data from the temp table into the live tables via INSERT...SELECT queries. Think about what kind of transaction/locking you will need for this. It might be that the order of queries will make a difference. But if you're only adding data, I assume that a rather low isolation level should do... not sure though. But maybe that's not your concern right now?
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 wordpress项目注册报失败刷新后其实是成功状态,请求排查原因
  • ¥20 linxu服务器僵尸进程不释放,代码如何修改?
  • ¥15 pycharm激活不成功
  • ¥40 如果update 一个列名为参数的value
  • ¥15 基于51单片机的水位检测系统设计中LCD1602一直不显示
  • ¥15 OCS2安装出现问题,请大家给点意见
  • ¥15 ros小车启动launch文件报错
  • ¥15 vs2015到期想登陆但是登陆不上
  • ¥15 IPQ5018制作烧录固件,boot运行失败(操作系统-linux)(相关搜索:操作系统)(相关搜索:操作系统)
  • ¥20 icefall在librispeech基础上加入个人数据集