Xpath从多个复杂的标签中获取文本内容

I have this HTML template:

<center>
    <img src="image1">
    <br><br>
    <img src="image2">
    <br><br>
    <strong><em>TITLE1 :</em></strong> DESC1<br>
    <strong><em>TITLE2 :</em></strong> DESC2<br>
    <strong><em>TITLE3 :</em></strong> DESC3<br>
    <strong><em>TITLE4 :</em></strong> DESC4<br>
    <strong><em>TITLE5 :</em></strong> DESC5<br><br><br>
    <img src="image3">
    <br><br><br>DESC_GEN
</center>

I want to use xpath to get this expected result:

TITLE 1 = DESC 1
TITLE 2 = DESC 2
TITLE 3 = DESC 3
TITLE 4 = DESC 4
TITLE 5 = DESC 5
general = DESC_GEN

In an array so i can use the values elsewhere in my code.

This is what I have tried:

$dom = new DOMDocument();
$dom->loadHTML($html_string);
$xpath = new DOMXpath($dom);

$elements = $xpath->query("//em");
foreach($elements as $e) {
    echo $e->nodeValue . '<br/>';
}

But unfortunatelly this returns only TITLE 1, TITLE 2, TITLE 3 etc.

I want get their respective values (In this case DESC 1, DESC 2 etc ...).

What is the approach I can take to achieve this goal?

2个回答



仅供参考,您使用的HTML模板不是格式良好的xml文档。 它可能会也可能不会导致问题,具体取决于您的解析器。</ p>

获得所需内容的最简单方法可能是首先获取标题列表</ p>

< pre> // em / text()
</ code> </ pre>

然后使用</ p>

 获取描述列表 // em / following :: text()[1] 
</ code> </ pre>

然后使用</ p>

  /进行一般性描述 / center / text()[last()] 
</ code> </ pre>

最后只需进行一些字符串操作即可将其转换为所需的格式。</ p>



请注意,实际的xpath表达式可能因特定的HTML文档而异。 但是,上述内容适用于您提供的模板。 </ p>
</ div>

展开原文

原文

Just FYI, the HTML template you are using is not a well formed xml document. It may or may not cause problems depending on your parser.

The easiest way to get what you want is probably to first get the list of titles with

//em/text()

Then get the list of descriptions with

//em/following::text()[1]

Then the general description with

//center/text()[last()]

Finally just do some string manipulation to get it to the form that you want.


Note that the actual xpath expressions may vary depending on the specific HTML document. However the above should work for the template that you provided.

walk to the parent em which is strong or .. in xpah then select text()

$elements = $xpath->query("//em");
foreach($elements as $e) {
    $desc = $xpath->query("../following-sibling::text()", $e);
    echo $e->nodeValue . $desc[0]->nodeValue ."<br/>";
}
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问