doumin4553 2016-05-04 20:47
浏览 42
已采纳

我可以在具有不同参数的相同api上使用curl_multi_init吗?

So I am connecting to the https://genderize.io/ API. I want to scrape from this API as fast as possible because I might need to do 1,000,000 of searches at a time. Is it possible to attach 100,000 (10 names per request) different curl_init headers with different parameters and then execute them all in parallel? It seems too good to be true if i could. Also if I can't do this how else can I speed up the requests. My current code is using one instance of curl_init and changing the URL for each cycle in a for loop. Here is my current loop:

$ch3 = curl_init();
for($x = 0; $x < $loopnumber; $x = $x + 10){
    $test3 = curl_setopt_array($ch3, array(
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_URL => 'https://api.genderize.io?name[0]=' . $firstnames[$x] . '&name[1]=' . $firstnames[$x+1] . '&name[2]=' . $firstnames[$x+2] . '&name[3]=' . $firstnames[$x+3] . '&name[4]=' . $firstnames[$x+4] . '&name[5]=' . $firstnames[$x+5] . '&name[6]=' . $firstnames[$x+6] . '&name[7]=' . $firstnames[$x+7] . '&name[8]=' . $firstnames[$x+8] . '&name[9]=' . $firstnames[$x+9]
    ));
    $resp3 = curl_exec($ch3);
    echo $resp3;
    $genderresponse = json_decode($resp3,true);
  • 写回答

1条回答 默认 最新

  • doutang2382 2016-05-04 21:45
    关注

    TL;DR

    Yes, it is possible - in theory. But no, it won't work in practice. You better stay within a few hundred parallel connections.

    The longer story

    You will probably run out of sockets and possibly memory before you can create one million easy handles and add them to a libcurl multi handle.

    If you intend to communicate with the single same remote IP and port number and you only have one local IP address, and as each connection needs its own local port number you can't do more than 64K theoretic connections in parallel. You won't even get to 64K on most default configured operating systems. (You can do more if you speak to more remote IPs or have more local IPs to bind the connections to.)

    For the sake of this argument, if we assume you actually get up to 60K simultaneous connections, then you'll find out that the curl_multi_* API gets to a crawling speed with that many connections as it is select/poll based. libcurl itself has an event-based API that is the recommended one when you go beyond perhaps a few hundred parallel connections, but from within PHP you have no way to access nor use that.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同
  • ¥50 如何openEuler 22.03上安装配置drbd
  • ¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
  • ¥15 无线连接树莓派,无法执行update,如何解决?(相关搜索:软件下载)
  • ¥15 Windows11, backspace, enter, space键失灵