doushan6161 2012-09-06 23:04
浏览 74
已采纳

PHP正则表达式 - 使用指定类从所有链接获取文本[重复]

Possible Duplicate:
How to parse and process HTML with PHP?

I'm trying to use PHP and regex to grab all the hyperlinks from an external page. The links I care about scraping are structured as follows:

<li class="magic"><a href="http://blah.com">TargetText1</a></li>
<li class="magic"><a href="http://blah.com">TargetText2</a></li>

Please bear in mind I'm trying to get the anchor text NOT the url. I've got the code below working however it simply scrapes all the links on the page. I'm trying to scrape links only wrapped with the li class listed above.

 $url = "http://www.example.com"; 
 $input = @file_get_contents($url) or die("Could not access file: $url"); 

 $regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";

 if(preg_match_all("/$regexp/siU", $input, $matches)) { 
  print_r($matches);
 }
  • 写回答

1条回答 默认 最新

  • dongzaizai2015 2012-09-06 23:11
    关注
    <?php
    
        $dom = new domDocument;
        $dom->loadHTML($html);
        $dom->preserveWhiteSpace = false;
        $lis = $dom->getElementsByTagName('li');
        foreach($lis  as $li){
            if($li->getAttribute('class')=='magic'){
                $links = $li->getElementsByTagName('a');
                if($links->length){
                    echo $links->item(0)->nodeValue;
                }
            }
        }
    
    ?>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab
  • ¥20 重新写的代码替换了之后运行hbuliderx就这样了
  • ¥100 监控抖音用户作品更新可以微信公众号提醒
  • ¥15 UE5 如何可以不渲染HDRIBackdrop背景
  • ¥70 2048小游戏毕设项目
  • ¥20 mysql架构,按照姓名分表
  • ¥15 MATLAB实现区间[a,b]上的Gauss-Legendre积分
  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题
  • ¥15 linux驱动,linux应用,多线程