duanjiushu5063 2014-02-09 06:37
浏览 69
已采纳

如何匹配PHP正则表达式的特定文本链接

here I'm looking for a regular expression in PHP which would match the anchor with a specific "target="_parent" on it.I would like to get anchors with text like:

preg_match_all('<a href="http://" target="_parent">Text here</a>', subject, matches, PREG_SET_ORDER);

HTML:

<a href="http://" target="_parent">

    <FONT style="font-size:10pt" color=#000000 face="Tahoma">
        <DIV><B>Text</B> - Text </DIV>
    </FONT>

</a>

</DIV>
  • 写回答

2条回答 默认 最新

  • douan4347 2014-02-09 06:44
    关注

    To be honest, the best way would be not to use a regular expression at all. Otherwise, you are going to be missing out on all kinds of different links, especially if you don't know that the links are always going to have the same way of being generated.

    The best way is to use an XML parser.

    <?php
    
    $html = '<a href="http://" target="_parent">Text here</a>';
    function extractTags($html) {
        $dom = new DOMDocument;
        libxml_use_internal_errors(true);
        $dom->loadHTML($html); // because dom will complain about badly formatted html
        $sxe = simplexml_import_dom($dom);
        $nodes = $sxe->xpath("//a[@target='_parent']");
    
        $anchors = array();
        foreach($nodes as $node) {
            $anchor = trim((string)dom_import_simplexml($node)->textContent);
            $attribs = $node->attributes();
            $anchors[$anchor] = (string)$attribs->href;
        }
    
        return $anchors;
    }
    
    print_r(extractTags($html))
    

    This will output:

    Array (
        [Text here] => http://
    )
    

    Even using it on your example:

    $html = '<a href="http://" target="_parent">
    
    <FONT style="font-size:10pt" color=#000000 face="Tahoma">
            <DIV><B>Text</B> - Text </DIV>
                </FONT>
    
                </a>
    
                </DIV>
                ';
                print_r(extractTags($html));
    

    will output:

    Array (
        [Text - Text] => http://
    )
    

    If you feel that the HTML is still not clean enough to be used with DOMDocument, then I would recommend using a project such as HTMLPurifier (see http://htmlpurifier.org/) to first clean the HTML up completely (and remove unneeded HTML) and use the output from that to load into DOMDocument.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 如何在sql server里完成筛选
  • ¥15 请问为什么我配置IPsec后PC1 ping不通 PC2,抓包出来数据包也并没有被加密
  • ¥200 求博主教我搞定neo4j简易问答系统,有偿
  • ¥15 nginx的使用与作用
  • ¥100 关于#VijeoCitect#的问题,如何解决?(标签-ar|关键词-数据类型)
  • ¥15 一个矿井排水监控系统的plc梯形图,求各程序段都是什么意思
  • ¥50 安卓10如何在没有root权限的情况下设置开机自动启动指定app?
  • ¥15 ats2837 spi2从机的代码
  • ¥200 wsl2 vllm qwen1.5部署问题
  • ¥100 有偿求数字经济对经贸的影响机制的一个数学模型,弄不出来已经快要碎掉了