dongquelu1239 2014-09-17 15:00
浏览 28
已采纳

具有负前瞻和xhtml的正则表达式

I have the following regular expression which performs a negative lookahead.

/\b(\w+)\b(?![^<]*</{0,1}(a|script|link|img)>)/gsmi

What I want to do is to match all text including html except a, script, link and img. Now the problem occurs when an img tag is being used.

An image tag has no closing tag so the expression will not exclude the img tags.

<p>This is a sample text <a href="#">with</a> a link and an image <img src="" alt="" /> and so on</p>

The regular expression should not match the anchor (not even between the opening and closing tag) and it should not match the img.

I think I am almost there but I can't get it to work properly. This is what I've tried as well:

/\b(\w+)\b(?![^<]*</{0,1}(a|script|link)>)(?![^\<img]*>)/gsmi

Somehow the last one will only work (on img tag) when there is no "i" or "m" or "g" in the img tag. When you add something like height= it will not match.

Edit The goal is to extract all words from the text except those between anchor and image tags and there might be a chance that there is no html in it at all

  • 写回答

1条回答 默认 最新

  • dopuz8728 2014-09-17 19:15
    关注

    I know you asked for a regex, but here is a solution using something that won't summon Cthulhu.


    Example:

    $html = <<<'HTML'
    <p>This is a <em>sample</em> text <a href="#">with</a>
     a link and an image <img src="" alt="" /> and so on</p>
    HTML;
    
    $dom = new DOMDocument();
    $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    $xpath = new DOMXPath($dom);
    
    foreach($xpath->query('//a | //link | //script | //img') as $node) {
        $node->parentNode->removeChild($node);
    }
    
    echo $dom->saveHTML();
    

    Output:

    <p>This is a <em>sample</em> text 
     a link and an image  and so on</p>
    

    I recommend considering it as an option.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本