drhdjp97757 2015-09-07 10:05
浏览 68
已采纳

HTML解析错误:服务乱序 - 尝试解析网站时

i want to parse a website but i always get an Error: service out of order.

No matter what start or end string i give. I also tried to use an other URL and i copied full examples from other users that works for them but not for me. I also tried to increase the Size to 20000. But nothing is working.

Here is my php-Script:

<?php
// URL, die durchsucht werden soll
$url = "http://cordis.europa.eu/project/rcn/85400_en.html";

// Zeichenfolge vor relevanten Einträgen
$startstring = "<div class='tech'><p>";

// bis zum nächsten html tag bzw. Zeichenfolge nach relevanten Einträgen
$endstring = "<"; 

$file = @fopen ($url,"r");

if($file)
{
    echo "URL found<br>";
}

if (trim($file) == "") {
    echo "Service out of order - File:".$file."<br>";
    } else {
    $i=0;
    while (!feof($file)) {

        // Wenn das File entsprechend groß ist, kann es unter Umständen
        // notwendig sein, die Zahl 2000 entsprechend zu erhöhen. Im Falle
        // eines Buffer-Overflows gibt PHP eine entsprechende Fehlermeldung aus.

        $zeile[$i] = fgets($file,20000);
        $i++;
    }
    fclose($file);
}

// Data filtering

for ($j=0;$j<$i;$j++) {
    if ($resa = strstr($zeile[$j],$startstring)) {
        $resb = str_replace($startstring, "", $resa);
        $endstueck = strstr($resb, $endstring);
        $resultat .= str_replace($endstueck,"",$resb);
        $resultat .= "; ";
    }
}

// Data output

echo ("Result = ".$resultat."<br>");
return $resultat;

Any help is appreciate. thanks in advance

EDIT: The URL is found and file has a value: Resource id #3

  • 写回答

2条回答 默认 最新

  • doudong8713 2015-09-07 10:48
    关注

    Use this it will give expected output.

    <?php
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,"http://cordis.europa.eu/project/rcn/85400_en.html");
    curl_setopt($ch, CURLOPT_GET, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_ENCODING, ''); 
    
    $headers = array();
    $headers[] = 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
    $headers[] = 'Accept-Encoding:gzip, deflate, sdch';
    $headers[] = 'Accept-Language:en-US,en;q=0.8';
    $headers[] = 'Cache-Control:max-age=0';
    $headers[] = 'Connection:keep-alive';
    $headers[] = 'Cookie:CORDIS=14.141.177.158.1441621012200552; PHPSESSID=jrf2e3t4vu56acdkf9np0tat06; WT_FPC=id=14.141.177.158-1441621016.978424:lv=1441605951963:ss=1441604805004
    Host:cordis.europa.eu';
    $headers[] = 'User-Agent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36';
    $headers[] = 'Host:cordis.europa.eu';
    $headers[] = 'Request URL:http://cordis.europa.eu/project/rcn/85400_en.html';
    
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    $server_output = curl_exec($ch);
    curl_close($ch);
    
    $dom = new DOMDocument;
    $dom->loadHTML($server_output);
    $xpath = new DomXpath($dom);
    
    $div = $xpath->query("//*[@class='tech']")->item(0);
    $data = trim($div->textContent);
    echo $data;
    ?>
    

    Output

    enter image description here

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 求计算赫斯特(Hurst)指数
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办
  • ¥15 kylin启动报错log4j类冲突
  • ¥15 超声波模块测距控制点灯,灯的闪烁很不稳定,经过调试发现测的距离偏大