抓取文本时,xpath返回空节点列表

im building a small scraping tool that will scape the urls from a google page. im trying to get the value value from "cite" which holds the url as text. im loading the webpage using curl to the doms load html. when i do a print_r the the results are displayed. so there is no problem with curl

below is my code

    $dom = new DOMDocument();
    $dom->loadHTML($result);

    $xpath = new DOMXPath($dom);

            $elements = $xpath->query("//cite[@class='vurls']");

            print_r($elements);

    foreach ($elements as $entry)
    {
     print_r($entry);
             //show cite url
    }

when i use //cite[@class='vurls'] in the firefox xpath checker it evaluates and shows all the cite text. but in my code the $elements is always empty.

i also tried the full path inside my query

//div[@id='ires']/ol[@id='rso']//li/div/div/div/div/cite

but it still returns a empty value.

an example query is

http://www.google.co.uk/search?q=xpath

can someone please tell me what am i doing wrong?

1个回答

Google is serving different HTML depending on the browser used. Have a look at the HTML you receive in PHP, not in Firefox. There is no @class attribute in the <cite/> elements, you need to find another way to query them, eg.

//div[@class='kv']/cite

Anyway: Don't parse Google search results, they offer an API for doing that. Parsing websites is likely to break (because they will change over time, and they do often), APIs are stable.

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问