在PHP中使用XPath获取href属性

I am new to PHP and trying to write a scrapper for a website.

I am trying to get an element with class name categories. I have use

$showPage = '<li class="categories">Categories<ul>  <li class="cat-item cat-item-940"><a href="http://www.desitvbox.me/category/star-plus/amul-taste-of-india/" >Amul Taste of India</a>
</li>
    <li class="cat-item cat-item-942"><a href="http://www.desitvbox.me/category/star-plus/dance-plus/" >Dance Plus</a>
</li>
    <li class="cat-item cat-item-239"><a href="http://www.desitvbox.me/category/star-plus/diya-aur-baati-hum-star/" >Diya Aur Baati Hum</a>
</li>
    <li class="cat-item cat-item-745"><a href="http://www.desitvbox.me/category/star-plus/suhani-si-ek-ladki/" >Suhani Si Ek Ladki</a>
</li>
    <li class="cat-item cat-item-147"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/" >Star Plus Completed Shows</a>
<ul class="children">
    <li class="cat-item cat-item-772"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/airlines/" >Airlines</a>
</li>
    <li class="cat-item cat-item-518"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/arjun/" >Arjun</a>
</li>
    <li class="cat-item cat-item-237"><a href="http://www.desitvbox.me/category/star-plus/star-plus-completed-shows/chef-pankaj-ka-zayka/" >Chef Pankaj Ka Zayka</a>
</li>
</ul>
</li>
</ul></li>';   
$dom = new DOMDocument();
$dom->validateOnParse = true;
$dom->loadHTML($showPage);  
$dom->preserveWhiteSpace = false;

$allShowsList = new DOMXPath($dom);
$allShowsTableHTML = $allShowsList->query('//li[contains(@class, "categories")]'); 

However, I want to now read the values of all a href mentioned in $allShowsTableHTML.

Can you please advise how can I do that?

As you can see one the record also have ul class = 'childern'. which I also want to read.

I need to get the href and the title.

I have tried below but no result.

$allShowTableDom = new DOMDocument();
foreach ($allShowTableHTML as $showLink)
{
    $allShowTableDom->appendChild($allShowTableDom->importNode($showLink,true));
} 
$showsArray = $allShowsTableHTML->getElementsByTagName('a');

I think it is not going in foreach loop.

dongliehuan3925
dongliehuan3925 请查看XPath和XQuery之间的区别。您正在使用XPath,XQuery是一个超集,PHP本身不支持。
大约 5 年之前 回复

1个回答



要获取超链接的所有 href </ code>属性,请添加更多轴步骤,最后遍历结果列表 , - &gt; value </ code>属性将包含URI。</ p>

鉴于您可以将所有 href </ code>属性转储到整个< 代码>&lt; li&gt; </ code>元素,只需通过 // a / @ href </ code>扩展您的查询:</ p>

  $ document = new DOMXPath  ($ dom); 
$ hrefs = $ document-&gt; query('// li [contains(@class,“categories”)] // a / @ href');

Nachach($ hrefs as $ href){
echo $ href-&gt; value;
}
</ code> </ pre>

如果这包含节点,则不需要 想要获取,你也可以下载包含未排序的列表并使用更具体的查询进行选择:</ p>

  // li [contains(  @class,“categories”)] / ul / li / a / @ href 
</ code> </ pre>
</ div>

展开原文

原文

To get all href attributes of the hyperlinks, add some more axis steps, finally loop over the result list, where the ->value property will contain the URIs.

Given you can just dump all href attributes inside the whole <li> element, simply extend your query by //a/@href:

$document = new DOMXPath($dom);
$hrefs = $document->query('//li[contains(@class, "categories")]//a/@href'); 

foreach ($hrefs as $href) {
  echo $href->value;
}

If this contains nodes you don't want to get, you could also descend the contain unsorted list and select with a more specific query:

//li[contains(@class, "categories")]/ul/li/a/@href

doucanshou6998
doucanshou6998 对于那个很抱歉。 非常感谢你的帮助
大约 5 年之前 回复
duanguochi6194
duanguochi6194 请查看常见问题解答,我们真的不喜欢没有立即相关的后续问题。 无论如何:你必须删除@href轴步骤并使用PHP的DOM来访问这两个属性。
大约 5 年之前 回复
duanluo5096
duanluo5096 感谢您的回答。 它工作得很完美。 您还可以建议,我如何获得“a”标签的标题。 即。 像“Suhani Si Ek Ladki”这样的链接的名称...真的很感谢你的帮助。
大约 5 年之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问