doushi7805 2010-06-30 15:57
浏览 65
已采纳

通过php dom,通过html片段中的超链接查找和替换关键字

I'm trying to use the simple_html_dom php class to create a find and replace function that looks for keywords and replace them by a link to a definition of the keyword, with the keyword as link text.

How can i find and replace "Dexia" with <a href="info.php?tag=dexia">Dexia</a> using this class, inside a string such as <div><p>The CEO of the Dexia bank has just decided to retire.</p></div> ?

  • 写回答

1条回答 默认 最新

  • doushibu2453 2010-06-30 17:00
    关注

    That's somewhat tricky, but you could do it this way:

    $html = <<< HTML
    <div><p>The CEO of the Dexia bank <em>has</em> just decided to retire.</p></div>
    HTML;
    

    I've added an emphasis element just to illustrate that it works with inline elements too.

    Setup

    $dom = new DOMDocument;
    $dom->formatOutput = TRUE;
    $dom->loadXML($html);
    $xpath = new DOMXPath($dom);
    $nodes = $xpath->query('//text()[contains(., "Dexia")]');
    

    The interesting thing above is the XPath of course. It queries the loaded DOM for all DOMText nodes containing the needle "Dexia". The result is DOMNodeList (as usual).

    The replacement

    foreach($nodes as $node) {
        $link     = '<a href="info.php?tag=dexia">Dexia</a>';
        $replaced = str_replace('Dexia', $link, $node->wholeText);
        $newNode  = $dom->createDocumentFragment();
        $newNode->appendXML($replaced);
        $node->parentNode->replaceChild($newNode, $node);
    }
    echo $dom->saveXML($dom->documentElement);
    

    The found $node will contain the string The CEO of the Dexia bank for wholeText, despite it being inside the P element. That is because the $node has a sibling DOMElement with the emphasis after bank. I am creating the link as a string instead of a node and replace all occurences of "Dexia" (regardless of word boundary - that would be a good call for Regex) in the wholeText with it. Then I create a DocumentFragment from the resulting string and replace the DOMText node with it.

    W3C vs PHP

    Using DocumentFragement::applyXML() is a non-standard approach, because the method is not part of the W3C DOM Specs.

    If you would want to do the replacement with the standard API, you'd first have to create the A Element as a new DOMElement. Then you'd have to find the offset of "Dexia" in the nodeValue of the DOMText and split the DOMText Node into two nodes at that position. Remove Dexia from the returned sibling and insert the Link Element, before the second one. Repeat this procedure with the sibling node until no more Dexia strings are found in the node. Here is how to do it for one occurence of Dexia:

    foreach($nodes as $node) {
        $link = $dom->createElement('a', 'Dexia');
        $link->setAttribute('href', 'info.php?tag=dexia');
        $offset  = strpos($node->nodeValue, 'Dexia');
        $newNode = $node->splitText($offset);
        $newNode->deleteData(0, strlen('Dexia'));
        $node->parentNode->insertBefore($link, $newNode);
    }
    

    And finally the output

    <div>
      <p>The CEO of the <a href="info.php?tag=dexia">Dexia</a> bank <em>has</em> just decided to retire.</p>
    </div>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 调查 Vitis AI 中验证 .xmodel 量化后的正确性
  • ¥30 codelite全屏时file、setting那一行消失了
  • ¥15 gazebo-rviz教程
  • ¥15 付费求做一个自助抢单插件
  • ¥15 bat批处理,关于数据复制问题
  • ¥50 同步两个不同结果的array中某些属性
  • ¥15 悬赏15远程操控解决问题
  • ¥15 CST复制的模型无法单独修改参数?
  • ¥15 前端页面想做个定时任务,但是使用requestAnimationFrame,setinterval和settimeout都不行
  • ¥15 根据以下文字信息,做EA模型图