dongqiaolong9034 2017-04-03 14:09
浏览 13

更好的处理同时刮取大量数据的过程

I am building a web app that has this process.

1) User registers

2) After user registers, i am running a queuing process that scrapes 60k+ worth of customer data. These data came from a 3rd party API and I use curl in doing this.

3) After I scrape these data, I store it in the database.

4) These scraped data from the 3rd party api has a pagination, so what I do is that I checked the response of the API if it has another page (nextPageUrl) and if it has that response, I curl again then get all the customer data and store it again. This continues until there's no nextPageUrl from the api response.

//this is a pseudo code

RegisterUser(user);
CallThirdPartyAPI()

function RegisterUser(user){
 insert_in_users_table(user)
}

function CallThirdPartyAPI($url=null){
    $customers = get_all_customers();
    for($customer as $cust){
      store_in_customers_table();
      if($cust->response_has_next_page_url)
         CallThirdayPartyAPI($cust->next_page_url);
      else
         return false;
 }
}

Now as you can see, this is ok if I only have 1 user at a time registering in my web app. But as I have a 100+ users registering in my web app, this is becoming a problem because scraping of data takes 20-30 minutes to be finished and I am running the job queue of only having 2 jobs at a time. So basically the 2 jobs needs to be done in order for the other jobs to be executed.

Now, i am looking for a better solution that would enhance and make the system efficient.

Your suggestion will be greatly appreciated.

PS:

I am running job queuing through supervisor

I have a read replica implemented in my database. I write in the master db while read on the replica to lessen cpu usage of my db.

  • 写回答

2条回答 默认 最新

  • dounue6984 2017-04-03 18:21
    关注

    Are you using sql database? Have you consider using no sql's such as MongoDB. I had a similar issue: using curl to get huge amount of data. With MongoDB it is more efficient and faster as it uses no sql and you can store those data as json/array,however you want. You can also Use MongoDB for your API and sql database for others.

    评论

报告相同问题?

悬赏问题

  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?