dongxia8656 2015-06-30 12:42
浏览 22

PHP CURL检索部分页面

I have the following CURL code:

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url);
if ($postParameters != '') {
    curl_setopt($ch, CURLOPT_POST, TRUE);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postParameters);
}
curl_setopt($ch, CURLOPT_COOKIEFILE, __DIR__.'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, __DIR__.'/cookie.txt');
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt($ch, CURLOPT_REFERER, $referer);
$pageResponse = curl_exec($ch); 
curl_close($ch); 

When I try to fetch pages, most of the time I get the entire page I asked for. However, from time to time I will get only parts of the page, for example:

DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en"> head> meta http-equiv="Content-Type" content="text/html; charset=windows-1251" /> meta name="generator" content="

I removed the "<" in front of the tags so the HTML code would be displayed on stack exchange. Does anybody knows why it suddenly stops receiving? I noticed that the data often abruptly stops after an open double quotes (i.e. content=" or username="). Not sure 100% if it always happens that way. In any case, could this be an encoding issue? Any other ideas?

Any help would be appreciated.

  • 写回答

1条回答 默认 最新

  • dougang1605 2015-06-30 13:36
    关注

    You can try to add some debugging.

    Add these options:

    curl_setopt($ch, CURLOPT_VERBOSE, true);
    curl_setopt($ch, CURLOPT_STDERR,$f = fopen(__DIR__ . "/error.log", "w+"));
    

    And these before curl_close():

    if($errno = curl_errno($ch)) {
        $error_message = curl_strerror($errno);
        echo "cURL error ({$errno}):
     {$error_message}";
    }
    

    If that doesn't work try increasing the timeout and see if it goes away:

    curl_setopt($ch, CURLOPT_TIMEOUT, 300); 
    

    If the timeout increase works, then find out why.

    评论

报告相同问题?

悬赏问题

  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 划分vlan后不通了
  • ¥15 GDI处理通道视频时总是带有白色锯齿
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)
  • ¥15 自适应 AR 模型 参数估计Matlab程序
  • ¥100 角动量包络面如何用MATLAB绘制
  • ¥15 merge函数占用内存过大
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大