douwu5009
2019-02-01 17:50
浏览 366
已采纳

PHP Multi-cURL请求延迟到超时

Summary

I have some PHP 5.4 code which fetches a batch of Facebook/Instagram photos in parallel using multi curl. This code has been working for years, and nothing has changed as far as I can tell.

I add multiple curl requests to a 'multi' request. Each curl request gets a CURLOPT_TIMEOUT. The problem I'm seeing is that, all of a sudden, some of my requests don't complete until this timeout is reached (no matter what timeout I set).

Code

I do something like this (simplified):

do {
    while (CURLM_CALL_MULTI_PERFORM === curl_multi_exec($mh, $running));

    // Wait for activity on any curl-connection (optional, reduces CPU)
    curl_multi_select($mh);

    // a request was just completed -- find out which one
    while($done = curl_multi_info_read($mh))
    {
        $completedCurlRequest = $done['handle'];

        //save the file
        do_some_work(completedCurlRequest);

        curl_multi_remove_handle($mh, $completedCurlRequest);
    }
} while ($running);

I use this script to run batches of about 40 parallel requests to fetch some images (from Facebook). Most of them take about 500ms to complete. However, a few of the requests "hang" (until the CURLOPT_TIMEOUT) before they arrive.

Basically the curl_multi_select step takes the entire timeout. Or, if I remove that curl_multi_select line, the outer loop spins (burning CPU) until the timeout.

Considerations

  • It doesn't matter what the timeout is - if I set the timeout to 30s, they arrive after 30 seconds, If I set the timeout to 1s, they arrive after 1s!

  • This is a really sudden change that does not correlate with any code release - it was all working fine up until 30th Jan 2019, but on the 31st it suddenly stopped working.

  • This isn't easy to reproduce, as it only affects an image once. If I repeat it for a batch of images I already fetched, it works fine the next time round.

  • It affects both Facebook and Instagram images, so I think the issue must be to do with my code or my server (and not Facebook or Instagram), as they wouldn't have both changed something simultaneously.

Questions

  1. Am I doing something wrong in my use of multi-curl that could cause this? (but if so, what's changed?)
  2. Have Facebook and Instagram changed anything that might cause this?
  3. Could something on my server have changed to trigger this?
  4. How can I debug this?

Update Here is the what I get back from a slow request when it finally completes:

INFO

"content_type": "image/jpeg",
"http_code": 200,
"header_size": 377,
"request_size": 180,
"total_time": 15.001012,    //<----- Total time == CURLOPT_TIMEOUT
"namelookup_time": 0.007149,
"connect_time": 0.12018,
"pretransfer_time": 0.441911,
"size_download": 40714,
"speed_download": 2714,
"download_content_length": -1,   //<------Not set

HEADER

HTTP/2 200 
content-type: image/jpeg
x-haystack-needlechecksum: 3529661797
timing-allow-origin: *
access-control-allow-origin: *
cache-control: max-age=1209600, no-transform
date: Mon, 04 Feb 2019 14:04:17 GMT
access-control-expose-headers: X-FB-CEC-Video-Limit

It is missing the content-length header, but that always seems to be the case the first time a file is fetched. Only 1 or 2 of the 50 parallel requests are slow, yet all of the requests are missing their content length headers.

If I fetch the same file again, it is much quicker, and I do see content length being set this time

INFO

"download_content_length": 52721,

HEADER

content-length: 52721           
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • dongshi1914 2019-02-04 09:28
    已采纳

    My current theory is that there is a bug in Facebook fileserver that means the connection is sometimes not being closed even though the data has been sent, so the connection stays open until the timeout. In the absence of the (optional) content-length header being sent by Facebook's fileserver, cURL can't know if the payload is complete, and so hangs.

    My current solution is to 'prime' the fileserver by first making a request for the image without a body, like this:

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_NOBODY, 1);
    curl_exec($ch);
    

    This is a pretty quick process, since there is no image being returned. I actually do this in the background using asynchronous multi curl, so I can get on with doing some other processing.

    After priming the fileserver, subsequent requests for the files are even quicker than before, as the content-length is known.

    This is a bit of a clumsy approach, but in the absence of any response from Facebook so far I'm not sure what else to do.

    打赏 评论