dsf12313 2018-06-14 11:42
浏览 53

PHP 7:XPath - 如何简化此查询?

From HTML page: https://www.topazlabs.com/downloads I want extract Topaz ReMask version number for Windows as string: v5.0.1

  1. I download HTML with curl

  2. I use query:

like this ;

 ->finder->query("//div[contains(@class, 'wpb_wrapper')]/.//a[text()[contains(.,'Topaz ReMask')]]/../../../div");

OR

...->finder->query("//div[contains(@class, 'wpb_wrapper')]//a[text()[contains(.,'Topaz ReMask')]]/../../../div");
  1. Then I look for all DIV tags to search the one with this two strings "/" and "(Win)", something like this: $versionString = Find($nodes, "/", "(Win)");

  2. I process text to extract only Windows version.

It works, but can it be simplified?

The HTML part of the page I work with is this:

...
<div class="wpb_wrapper">
  <div class="vc_empty_space" style="height: 20px">
    <span class="vc_empty_space_inner">
    </span>
  </div>
  <div id="mpc_textblock-975b2251c2a82c7" class="mpc-textblock mpc-init mpc-typography--preset_2 ">
    <p>
      <a href="/remask" target="blank">Topaz ReMask</a>
    </p>
  </div>
  <div class="mpc-tooltip-wrap" data-id="mpc_textblock-615b2251c2a8c4a">
    <div id="mpc_textblock-615b2251c2a8c4a" class="mpc-textblock mpc-init mpc-typography--preset_0 ">
      <p>
        <em>v5.0.3 (Mac) / v5.0.1 (Win)
        </em>
      </p>
    </div>
    <div id="mpc_tooltip-925b2251c2a8d2f" class="mpc-tooltip mpc-init mpc-typography--preset_4 mpc-position--left mpc-can-hover mpc-trigger--hover ">Mac Updated November 4, 2016
      <br>Windows Updated November 21, 2016
      <div class="mpc-arrow">
      </div>
    </div>
  </div>
  <div id="mpc_textblock-475b2251c2a9601" class="mpc-textblock mpc-init ">
    <p>The quickest and easiest way to mask your photo.
    </p>
  </div>
</div>
...
  • 写回答

1条回答 默认 最新

  • duancaishun4812 2018-08-02 08:55
    关注

    Well you could base it on the text content only. Using DOMXpath::evaluate() you can fetch the string directly:

    $document= new DOMDocument();
    $document->loadHTML($html);
    $xpath = new DOMXpath($document);
    
    $expression = "substring-after(
      //div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')],
      'Windows Updated '
    )";
    
    var_dump($xpath->evaluate($expression));
    

    Output:

    string(24) "November 21, 2016
          "
    
    Xpath expression
    • get any div that has a p with the text Topaz ReMask, ...
      //div[contains(.//p, 'Topaz ReMask')]
    • ...the text descendant nodes that start with Windows Updated...
      //div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')]
    • ... and extract the text after Windows Updated:
        substring-after(
          //div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')],
          'Windows Updated '
        )
    
    评论

报告相同问题?

悬赏问题

  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用
  • ¥15 C++ yoloV5改写遇到的问题
  • ¥20 win11修改中文用户名路径
  • ¥15 win2012磁盘空间不足,c盘正常,d盘无法写入
  • ¥15 用土力学知识进行土坡稳定性分析与挡土墙设计
  • ¥70 PlayWright在Java上连接CDP关联本地Chrome启动失败,貌似是Windows端口转发问题
  • ¥15 帮我写一个c++工程
  • ¥30 Eclipse官网打不开,官网首页进不去,显示无法访问此页面,求解决方法
  • ¥15 关于smbclient 库的使用