duanjiushu5063 2014-02-09 06:37
浏览 69
已采纳

如何匹配PHP正则表达式的特定文本链接

here I'm looking for a regular expression in PHP which would match the anchor with a specific "target="_parent" on it.I would like to get anchors with text like:

preg_match_all('<a href="http://" target="_parent">Text here</a>', subject, matches, PREG_SET_ORDER);

HTML:

<a href="http://" target="_parent">

    <FONT style="font-size:10pt" color=#000000 face="Tahoma">
        <DIV><B>Text</B> - Text </DIV>
    </FONT>

</a>

</DIV>
  • 写回答

2条回答 默认 最新

  • douan4347 2014-02-09 06:44
    关注

    To be honest, the best way would be not to use a regular expression at all. Otherwise, you are going to be missing out on all kinds of different links, especially if you don't know that the links are always going to have the same way of being generated.

    The best way is to use an XML parser.

    <?php
    
    $html = '<a href="http://" target="_parent">Text here</a>';
    function extractTags($html) {
        $dom = new DOMDocument;
        libxml_use_internal_errors(true);
        $dom->loadHTML($html); // because dom will complain about badly formatted html
        $sxe = simplexml_import_dom($dom);
        $nodes = $sxe->xpath("//a[@target='_parent']");
    
        $anchors = array();
        foreach($nodes as $node) {
            $anchor = trim((string)dom_import_simplexml($node)->textContent);
            $attribs = $node->attributes();
            $anchors[$anchor] = (string)$attribs->href;
        }
    
        return $anchors;
    }
    
    print_r(extractTags($html))
    

    This will output:

    Array (
        [Text here] => http://
    )
    

    Even using it on your example:

    $html = '<a href="http://" target="_parent">
    
    <FONT style="font-size:10pt" color=#000000 face="Tahoma">
            <DIV><B>Text</B> - Text </DIV>
                </FONT>
    
                </a>
    
                </DIV>
                ';
                print_r(extractTags($html));
    

    will output:

    Array (
        [Text - Text] => http://
    )
    

    If you feel that the HTML is still not clean enough to be used with DOMDocument, then I would recommend using a project such as HTMLPurifier (see http://htmlpurifier.org/) to first clean the HTML up completely (and remove unneeded HTML) and use the output from that to load into DOMDocument.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效
  • ¥15 悬赏!微信开发者工具报错,求帮改
  • ¥20 wireshark抓不到vlan
  • ¥20 关于#stm32#的问题:需要指导自动酸碱滴定仪的原理图程序代码及仿真
  • ¥20 设计一款异域新娘的视频相亲软件需要哪些技术支持
  • ¥15 stata安慰剂检验作图但是真实值不出现在图上
  • ¥15 c程序不知道为什么得不到结果
  • ¥15 键盘指令混乱情况下的启动盘系统重装
  • ¥40 复杂的限制性的商函数处理