duandaiqin6080 2016-03-28 20:27
浏览 66
已采纳

在Heroku应用程序上将大文件导入mysql

I need some help.

I have an php app on Heroku. In this app, there's a form that upload an csv file, to be imported on Mysql(cleardb).

The problem it's, that the file it's large (will always be large), and the function it's taking too much time to finish (about 90 seconds). The timeout on heroku it's 30 seconds, and there's no way to change that.

I tried to use Heroku Scheduler (like cron), but the minimal frequency it's 10 minutes, and a script that can take 90 seconds, using this scheduler, will take 30 minutes, because as i said, the timeout of heroku it's 30 seconds.

Well, what can i do? there's an alternative scheduler?

Example of the import:

CSV

name,productName,points,categoryName,coordName,date

MYSQL

[users]

userID
userName
categoryID
coordID

[products]

productID
productName

[coords]

coordID
coordName

[categories]

categoryID
categoryName

[points]

pointID
productID
userID
value

in all tables, i need to make a select to see if the category, coord, etc, already exists. If exists, return id, if not, insert a new line.

I dont think that there's a way to decrease time execution time. I'm trying to find a way to decrease the schedule to 2 minutes, 3 minutes, etc. So, in about 10 minutes, all lines will be imported.

thanks!

  • 写回答

1条回答 默认 最新

  • dqsa17330 2016-03-28 21:21
    关注

    This is what I would start with (because it's relatively simple/quick to implement and should give you a reference point and some wiggle room for further tests in a short period of time):

    Import all the data as-is into a temporary table (if the server's RAM allow you can also try the memory engine).
    Then, after the data has been imported, create the indices needed for the following queries (and check via EXPLAIN or any other tool that shows you if and how the indices are used):

    • query all the categories that are in the temporary table but not in your live data tables
      • create those categories in the live tables.
    • query all coords that are in the temporary table but not in your live data tables.
      • create those coords in the live tables.
    • you get the idea ...repeat for all necessary data.
    • then just import the data from the temp table into the live tables via INSERT...SELECT queries. Think about what kind of transaction/locking you will need for this. It might be that the order of queries will make a difference. But if you're only adding data, I assume that a rather low isolation level should do... not sure though. But maybe that's not your concern right now?
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 WPF 大屏看板表格背景图片设置
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示