du13520157325 2018-11-20 18:42
浏览 1011

Xpath从多个复杂的标签中获取文本内容

I have this HTML template:

<center>
    <img src="image1">
    <br><br>
    <img src="image2">
    <br><br>
    <strong><em>TITLE1 :</em></strong> DESC1<br>
    <strong><em>TITLE2 :</em></strong> DESC2<br>
    <strong><em>TITLE3 :</em></strong> DESC3<br>
    <strong><em>TITLE4 :</em></strong> DESC4<br>
    <strong><em>TITLE5 :</em></strong> DESC5<br><br><br>
    <img src="image3">
    <br><br><br>DESC_GEN
</center>

I want to use xpath to get this expected result:

TITLE 1 = DESC 1
TITLE 2 = DESC 2
TITLE 3 = DESC 3
TITLE 4 = DESC 4
TITLE 5 = DESC 5
general = DESC_GEN

In an array so i can use the values elsewhere in my code.

This is what I have tried:

$dom = new DOMDocument();
$dom->loadHTML($html_string);
$xpath = new DOMXpath($dom);

$elements = $xpath->query("//em");
foreach($elements as $e) {
    echo $e->nodeValue . '<br/>';
}

But unfortunatelly this returns only TITLE 1, TITLE 2, TITLE 3 etc.

I want get their respective values (In this case DESC 1, DESC 2 etc ...).

What is the approach I can take to achieve this goal?

  • 写回答

2条回答

  • dounangqie4819 2018-11-20 22:11
    关注

    Just FYI, the HTML template you are using is not a well formed xml document. It may or may not cause problems depending on your parser.

    The easiest way to get what you want is probably to first get the list of titles with

    //em/text()
    

    Then get the list of descriptions with

    //em/following::text()[1]
    

    Then the general description with

    //center/text()[last()]
    

    Finally just do some string manipulation to get it to the form that you want.


    Note that the actual xpath expressions may vary depending on the specific HTML document. However the above should work for the template that you provided.

    评论

报告相同问题?

悬赏问题

  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码