doumin4553 2016-05-04 20:47
浏览 42
已采纳

我可以在具有不同参数的相同api上使用curl_multi_init吗?

So I am connecting to the https://genderize.io/ API. I want to scrape from this API as fast as possible because I might need to do 1,000,000 of searches at a time. Is it possible to attach 100,000 (10 names per request) different curl_init headers with different parameters and then execute them all in parallel? It seems too good to be true if i could. Also if I can't do this how else can I speed up the requests. My current code is using one instance of curl_init and changing the URL for each cycle in a for loop. Here is my current loop:

$ch3 = curl_init();
for($x = 0; $x < $loopnumber; $x = $x + 10){
    $test3 = curl_setopt_array($ch3, array(
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_URL => 'https://api.genderize.io?name[0]=' . $firstnames[$x] . '&name[1]=' . $firstnames[$x+1] . '&name[2]=' . $firstnames[$x+2] . '&name[3]=' . $firstnames[$x+3] . '&name[4]=' . $firstnames[$x+4] . '&name[5]=' . $firstnames[$x+5] . '&name[6]=' . $firstnames[$x+6] . '&name[7]=' . $firstnames[$x+7] . '&name[8]=' . $firstnames[$x+8] . '&name[9]=' . $firstnames[$x+9]
    ));
    $resp3 = curl_exec($ch3);
    echo $resp3;
    $genderresponse = json_decode($resp3,true);
  • 写回答

1条回答 默认 最新

  • doutang2382 2016-05-04 21:45
    关注

    TL;DR

    Yes, it is possible - in theory. But no, it won't work in practice. You better stay within a few hundred parallel connections.

    The longer story

    You will probably run out of sockets and possibly memory before you can create one million easy handles and add them to a libcurl multi handle.

    If you intend to communicate with the single same remote IP and port number and you only have one local IP address, and as each connection needs its own local port number you can't do more than 64K theoretic connections in parallel. You won't even get to 64K on most default configured operating systems. (You can do more if you speak to more remote IPs or have more local IPs to bind the connections to.)

    For the sake of this argument, if we assume you actually get up to 60K simultaneous connections, then you'll find out that the curl_multi_* API gets to a crawling speed with that many connections as it is select/poll based. libcurl itself has an event-based API that is the recommended one when you go beyond perhaps a few hundred parallel connections, but from within PHP you have no way to access nor use that.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 matlab解优化问题代码
  • ¥15 写论文,需要数据支撑
  • ¥15 identifier of an instance of 类 was altered from xx to xx错误
  • ¥100 反编译微信小游戏求指导
  • ¥15 docker模式webrtc-streamer 无法播放公网rtsp
  • ¥15 学不会递归,理解不了汉诺塔参数变化
  • ¥15 基于图神经网络的COVID-19药物筛选研究
  • ¥30 软件自定义无线电该怎样使用
  • ¥15 R语言mediation包做中介分析,直接效应和间接效应都很小,为什么?
  • ¥15 Jenkins+k8s部署slave节点offline