dongque1462 2014-09-01 14:59
浏览 46
已采纳

使用preg_match从页面代码中查找链接

I would like to change this using preg_match:

<li class="fte_newsarchivelistleft" style="clear: both; padding-left:0px;"><a class="fte_standardlink fte_edit" href="news,2480143,3-kolejka-sezonu-2014-2015.html">3 kolejka sezonu 2014/2015&nbsp;&raquo;&raquo;</a></li>
                      <li class="fte_newsarchivelistright" style="height: 25px;">komentarzy: <span class="fte_standardlink">[0]</span></li>

To this:

news,2480143,3-kolejka-sezonu-2014-2015.html

How can I do it? I'm trying with preg_match but that link is too complicated...

  • 写回答

1条回答 默认 最新

  • drju37335 2014-09-01 15:07
    关注

    Using preg_match would indeed be too complicated. As stated on this site many times before: regex + HTML don't mix well. Regex is not suitable to process markup. A DOM parser, however is:

    $dom = new DOMDocument;//create parser
    $dom->loadHTML($htmlString);
    $xpath = new DOMXPath($dom);//create XPath instance for dom, so we can query using xpath
    $elemsWithHref = $xpath->query('//*[@href]');//get any node that has an href attribtue
    $hrefs = array();//all href values
    foreach ($elemsWithHref as $node)
    {
        $hrefs[] = $node->getAttributeNode('href')->value;//assign values
    }
    

    After this, it's a simple matter of processing the values in $hrefs, which will be an array of strings, each of which are the value of a href attribute.

    Another example of using DOM parsers and XPath (to show you what it can do): can be found here

    To replace the nodes with the href values, it's a simple matter of:

    • Getting the parent node
    • constructing a text-node
    • calling DOMDocument::replaceChild
    • Finnishing up by calling save to write to a file, or saveHTML or saveXML to get the DOM as a string

    An example:

    $dom = new DOMDocument;//create parser
    $dom->loadHTML($htmlString);
    $xpath = new DOMXPath($dom);//create XPath instance for dom, so we can query using xpath
    $elemsWithHref = $xpath->query('//*[@href]');//get any node that has an href attribtue
    foreach ($elemsWithHref as $node)
    {
        $parent = $node->parentNode;
        $replace = new DOMText($node->getAttributeNode('href')->value);//create text node
        $parent->replaceChild($replace, $node);//replaces $node with $replace textNode
    }
    $newString = $dom->saveHTML();
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 寻找一位有逆向游戏盾sdk 应用程序经验的技术
  • ¥15 请问有用MZmine处理 “Waters SYNAPT G2-Si QTOF质谱仪在MSE模式下采集的非靶向数据” 的分析教程吗
  • ¥50 opencv4nodejs 如何安装
  • ¥15 adb push异常 adb: error: 1409-byte write failed: Invalid argument
  • ¥15 nginx反向代理获取ip,java获取真实ip
  • ¥15 eda:门禁系统设计
  • ¥50 如何使用js去调用vscode-js-debugger的方法去调试网页
  • ¥15 376.1电表主站通信协议下发指令全被否认问题
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
  • ¥15 复杂网络,变滞后传递熵,FDA