doumin4553 2016-05-04 20:47
浏览 42
已采纳

我可以在具有不同参数的相同api上使用curl_multi_init吗?

So I am connecting to the https://genderize.io/ API. I want to scrape from this API as fast as possible because I might need to do 1,000,000 of searches at a time. Is it possible to attach 100,000 (10 names per request) different curl_init headers with different parameters and then execute them all in parallel? It seems too good to be true if i could. Also if I can't do this how else can I speed up the requests. My current code is using one instance of curl_init and changing the URL for each cycle in a for loop. Here is my current loop:

$ch3 = curl_init();
for($x = 0; $x < $loopnumber; $x = $x + 10){
    $test3 = curl_setopt_array($ch3, array(
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_URL => 'https://api.genderize.io?name[0]=' . $firstnames[$x] . '&name[1]=' . $firstnames[$x+1] . '&name[2]=' . $firstnames[$x+2] . '&name[3]=' . $firstnames[$x+3] . '&name[4]=' . $firstnames[$x+4] . '&name[5]=' . $firstnames[$x+5] . '&name[6]=' . $firstnames[$x+6] . '&name[7]=' . $firstnames[$x+7] . '&name[8]=' . $firstnames[$x+8] . '&name[9]=' . $firstnames[$x+9]
    ));
    $resp3 = curl_exec($ch3);
    echo $resp3;
    $genderresponse = json_decode($resp3,true);
  • 写回答

1条回答 默认 最新

  • doutang2382 2016-05-04 21:45
    关注

    TL;DR

    Yes, it is possible - in theory. But no, it won't work in practice. You better stay within a few hundred parallel connections.

    The longer story

    You will probably run out of sockets and possibly memory before you can create one million easy handles and add them to a libcurl multi handle.

    If you intend to communicate with the single same remote IP and port number and you only have one local IP address, and as each connection needs its own local port number you can't do more than 64K theoretic connections in parallel. You won't even get to 64K on most default configured operating systems. (You can do more if you speak to more remote IPs or have more local IPs to bind the connections to.)

    For the sake of this argument, if we assume you actually get up to 60K simultaneous connections, then you'll find out that the curl_multi_* API gets to a crawling speed with that many connections as it is select/poll based. libcurl itself has an event-based API that is the recommended one when you go beyond perhaps a few hundred parallel connections, but from within PHP you have no way to access nor use that.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 dbLinq最新版linq sqlite
  • ¥20 对D盘进行分盘之前没有将visual studio2022卸载掉,现在该如何下载回来
  • ¥15 完成虚拟机环境配置,还有安装kettle
  • ¥15 2024年全国大学生数据分析大赛A题:直播带货与电商产品的大数据分析 问题5. 请设计一份优惠券的投放策略,需要考虑优惠券的数量、优惠券的金额、投放时间段和投放商品种类等因素。求具体的python代码
  • ¥15 有人会搭建生鲜配送自营+平台的管理系统吗
  • ¥15 用matlab写代码
  • ¥30 motoradmin系统的多对多配置
  • ¥15 求组态王串口自定义通信配置方法或代码?
  • ¥15 实验 :UML2.0 结构建模
  • ¥20 用vivado写数字逻辑实验报告撰写,FPGA实验