dongzhong7299 2015-08-15 13:05
浏览 14
已采纳

获取正则表达式的链接文本

How would one parse the content inside these tags, assuming the link is dynamic?

<h3 class="lvtitle">
<a href="http://www.ebay.com/itm/Chicago-Chicago-XXX-Audio-CD-/351478948979?hash=item51d5c72473" 
 class="vip" title="Click this link to access Chicago, Chicago XXX Audio CD">
Chicago, Chicago XXX Audio CD
</a>
</h3>

What I'm after is getting the "Chicago, Chicago XXX Audio CD" part.

  • 写回答

3条回答 默认 最新

  • dri8163 2015-08-15 13:35
    关注

    Parser example:

    $string = '<h3 class="lvtitle"><a href="http://www.ebay.com/itm/Chicago-Chicago-XXX-Audio-CD-/351478948979?hash=item51d5c72473"  class="vip" title="Click this link to access Chicago, Chicago XXX Audio CD">Chicago, Chicago XXX Audio CD</a></h3>';
    $doc = new DOMDocument(); //make a dom object
    $doc->loadHTML($string); // load the string into the object
    $links = $doc->getElementsByTagName('a'); //get all links
    foreach ($links as $link) { //loop through all links
        echo $link->nodeValue; //output text content of links
    }
    

    Output:

    Chicago, Chicago XXX Audio CD

    References:
    http://php.net/manual/en/domelement.getelementsbytagname.php
    http://php.net/manual/en/domdocument.loadhtml.php

    ...or if you really required a regex for some reason ( I don't see why parser wouldn't work)...

    $string = '<h3 class="lvtitle"><a href="http://www.ebay.com/itm/Chicago-Chicago-XXX-Audio-CD-/351478948979?hash=item51d5c72473"  class="vip" title="Click this link to access Chicago, Chicago XXX Audio CD">Chicago, Chicago XXX Audio CD</a></h3>';
    preg_match_all('~<a\h.*?>(.*?)</a>~', $string, $links_content);
    print_r($links_content[1]);
    

    Output:

    Array
    (
        [0] => Chicago, Chicago XXX Audio CD
    )
    

    ~ = delimiter
    <a = literally match <a
    \h = a horizontal white space
    .*? = anything untile the first occurrence of the next character
    > = a literal >
    (.*?) = a capture grouping capturing everything until the next character again
    </a> = literal </a>
    ~ = closing delimiter

    If you prefer regex101 write up, https://regex101.com/r/sT6yA9/1.

    Also note the preg_match_all that was incase your string had multiple links in it. With a single occurrence you could use preg_match.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度