duandaiqin6080 2016-03-28 20:27
浏览 66
已采纳

在Heroku应用程序上将大文件导入mysql

I need some help.

I have an php app on Heroku. In this app, there's a form that upload an csv file, to be imported on Mysql(cleardb).

The problem it's, that the file it's large (will always be large), and the function it's taking too much time to finish (about 90 seconds). The timeout on heroku it's 30 seconds, and there's no way to change that.

I tried to use Heroku Scheduler (like cron), but the minimal frequency it's 10 minutes, and a script that can take 90 seconds, using this scheduler, will take 30 minutes, because as i said, the timeout of heroku it's 30 seconds.

Well, what can i do? there's an alternative scheduler?

Example of the import:

CSV

name,productName,points,categoryName,coordName,date

MYSQL

[users]

userID
userName
categoryID
coordID

[products]

productID
productName

[coords]

coordID
coordName

[categories]

categoryID
categoryName

[points]

pointID
productID
userID
value

in all tables, i need to make a select to see if the category, coord, etc, already exists. If exists, return id, if not, insert a new line.

I dont think that there's a way to decrease time execution time. I'm trying to find a way to decrease the schedule to 2 minutes, 3 minutes, etc. So, in about 10 minutes, all lines will be imported.

thanks!

  • 写回答

1条回答 默认 最新

  • dqsa17330 2016-03-28 21:21
    关注

    This is what I would start with (because it's relatively simple/quick to implement and should give you a reference point and some wiggle room for further tests in a short period of time):

    Import all the data as-is into a temporary table (if the server's RAM allow you can also try the memory engine).
    Then, after the data has been imported, create the indices needed for the following queries (and check via EXPLAIN or any other tool that shows you if and how the indices are used):

    • query all the categories that are in the temporary table but not in your live data tables
      • create those categories in the live tables.
    • query all coords that are in the temporary table but not in your live data tables.
      • create those coords in the live tables.
    • you get the idea ...repeat for all necessary data.
    • then just import the data from the temp table into the live tables via INSERT...SELECT queries. Think about what kind of transaction/locking you will need for this. It might be that the order of queries will make a difference. But if you're only adding data, I assume that a rather low isolation level should do... not sure though. But maybe that's not your concern right now?
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 在若依框架下实现人脸识别
  • ¥15 网络科学导论,网络控制
  • ¥100 安卓tv程序连接SQLSERVER2008问题
  • ¥15 利用Sentinel-2和Landsat8做一个水库的长时序NDVI的对比,为什么Snetinel-2计算的结果最小值特别小,而Lansat8就很平均
  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同