doujiao2443 2015-06-06 09:23
浏览 134
已采纳

在PHP中使用XPath获取href属性

I am new to PHP and trying to write a scrapper for a website.

I am trying to get an element with class name categories. I have use

$showPage = '<li class="categories">Categories<ul>  <li class="cat-item cat-item-940"><a href="http://www.desitvbox.me/category/star-plus/amul-taste-of-india/" >Amul Taste of India</a>
</li>
    <li class="cat-item cat-item-942"><a href="http://www.desitvbox.me/category/star-plus/dance-plus/" >Dance Plus</a>
</li>
    <li class="cat-item cat-item-239"><a href="http://www.desitvbox.me/category/star-plus/diya-aur-baati-hum-star/" >Diya Aur Baati Hum</a>
</li>
    <li class="cat-item cat-item-745"><a href="http://www.desitvbox.me/category/star-plus/suhani-si-ek-ladki/" >Suhani Si Ek Ladki</a>
</li>
    <li class="cat-item cat-item-147"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/" >Star Plus Completed Shows</a>
<ul class="children">
    <li class="cat-item cat-item-772"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/airlines/" >Airlines</a>
</li>
    <li class="cat-item cat-item-518"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/arjun/" >Arjun</a>
</li>
    <li class="cat-item cat-item-237"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/chef-pankaj-ka-zayka/" >Chef Pankaj Ka Zayka</a>
</li>
</ul>
</li>
</ul></li>';   
$dom = new DOMDocument();
$dom->validateOnParse = true;
$dom->loadHTML($showPage);  
$dom->preserveWhiteSpace = false;

$allShowsList = new DOMXPath($dom);
$allShowsTableHTML = $allShowsList->query('//li[contains(@class, "categories")]'); 

However, I want to now read the values of all a href mentioned in $allShowsTableHTML.

Can you please advise how can I do that?

As you can see one the record also have ul class = 'childern'. which I also want to read.

I need to get the href and the title.

I have tried below but no result.

$allShowTableDom = new DOMDocument();
foreach ($allShowTableHTML as $showLink)
{
    $allShowTableDom->appendChild($allShowTableDom->importNode($showLink,true));
} 
$showsArray = $allShowsTableHTML->getElementsByTagName('a');

I think it is not going in foreach loop.

  • 写回答

1条回答

  • dsaaqdz6223 2015-06-06 09:58
    关注

    To get all href attributes of the hyperlinks, add some more axis steps, finally loop over the result list, where the ->value property will contain the URIs.

    Given you can just dump all href attributes inside the whole <li> element, simply extend your query by //a/@href:

    $document = new DOMXPath($dom);
    $hrefs = $document->query('//li[contains(@class, "categories")]//a/@href'); 
    
    foreach ($hrefs as $href) {
      echo $href->value;
    }
    

    If this contains nodes you don't want to get, you could also descend the contain unsorted list and select with a more specific query:

    //li[contains(@class, "categories")]/ul/li/a/@href
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题