dpxnrx11199 2016-09-24 06:33
浏览 45

通过命令脚本以正确的顺序运行来最小化运行时间

I am using scrapy and scrapyd to crawl some content. I have 28 crawlers that run, but only 8 at a time. Each crawler takes from 10 min to several hours to complete. So im looking for a way to order them correctly, in order to minimize the time the server is active.

I already gather information of how long each crawl takes, so it's only the minimization problem, or how to formulate it.

The script is started using php, so the solutions should preferably run in php.

  • 写回答

2条回答 默认 最新

  • duanbei7035 2016-09-24 07:52
    关注

    The best way I've found is to set them up as cronjobs to execute at specific times. I have around 30 cronjobs configured to start at various times meaning you can set specific times per scrap.

    Executing a PHP cmmand by cronjob at 5pm every day:

    * 17 * * * php /opt/test.php
    

    If you execute the scrapy python command via cronjob, its:

    * 17 * * * cd /opt/path1/ && scrapy crawl site1
    

    If your using virtualenv for you python then its

    * 17 * * * source /opt/venv/bin/activate && cd /opt/path1/ && scrapy crawl site1
    
    评论

报告相同问题?

悬赏问题

  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法
  • ¥15 可否在不同线程中调用封装数据库操作的类
  • ¥15 微带串馈天线阵列每个阵元宽度计算
  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据