duandanxiu6965 2010-08-18 15:51
浏览 33

正则表达式匹配HTML标记不包含其他标记

I am writing a regex find/replace that will insert a <span> into every <a href> in a file where a <span> does not already exist. It will allow other tags to be in the <a href> like <img>, <b>, etc.

Currently I have this regex:
Find: (<a[^>]+?style=".*?color:#(\w{6}).*?".*?>)(.+?)(<\/a>)
Replace: '$1<span style="color:#$2;">$3</span>$4'

It works great except if i run it over the same file, it will insert a <span> inside of a <span> and it gets messy.

Target Example:

We want it to ignore this:
<a href="http://mywebiste.com/link1.html" target="_blank" style="color:#bfbcba; text-decoration:underline;"><span style="color:#bfbcba;">Howdy</span></a>

But not this:
<a href="http://mywebiste.com/link1.html" target="_blank" style="color:#bfbcba; text-decoration:underline;">Howdy</a>

Or this:
<a href="http://mywebiste.com/link1.html" target="_blank" style="color:#bfbcba; text-decoration:underline;"><img src="myimg.gif" />Howdy</a>

--EDIT--

Using the PHP DOM library as suggested in the comments, this is what I have so far:

$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
    $spancount = $tag->getElementsByTagName("span")->length;
    if($spancount == 0){
        $element = $doc->createElement('span');
        $tag->appendChild($element);
    }
}

echo $doc->saveHTML();`

Currently it will detect if there is a span inside an anchor and if there is, it will append a span to the inside of the anchor, however, i have yet to figure out how to get the original contents of the anchor inside the span.

  • 写回答

1条回答 默认 最新

  • dongzhe6287 2010-08-18 15:55
    关注

    Don't use regex for this, it's not ideal for HTML.

    Use a DOM library and getElementsByTagName('a') then iterate through each anchor and see if it contains a sub span element with getElementsByTagName('span'), using the length property. If it doesn't, appendChild or assign the firstChild of the anchor node to your new span created with document.createElement('span').

    EDIT: As for grabbing the inner html of the anchor, if there are lots of nodes inside, try using this:

    <?php
    function innerHTML($node){
      $doc = new DOMDocument();
      foreach ($node->childNodes as $child)
        $doc->appendChild($doc->importNode($child, true));
    
      return $doc->saveHTML();
    }
    
    $html = innerHTML( $anchorRef );
    

    This may also help you out: Change innerHTML of a php DOMElement

    评论

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度