duanjian7617 2013-09-07 08:51
浏览 51
已采纳

使用preg_match刮取html页面

<?php 
    $contents = file_get_contents('link here');
    $doc = new DOMDocument();
    @$doc->loadHTML($contents);
    $xpath = new DOMXPath($doc);
    $xquery = '//tr[td[a]]';           
    $links = $xpath->query($xquery);   
    foreach ($links as $el) {
        $string = ($doc->saveHTML($el));
        preg_match('/<a class="LN" href=".*" onclick=".*">(.*)<\/a>/i', $string, $name); 
        preg_match('/<td align="center">.*\s*<\/td>\s*<td>(.*)<\/td>/i', $string, $locations);
        echo strip_tags($name[1]).' '.strip_tags($locations[1]);
    } 
?>

the value of my $string is

  <tr>
<td>
1.
 </td>
<td>
<a class="LN" href="" onclick="">
<b>Aaberg, Aaron E</b></a>
</td>
<td align="center">
54 
</td>
<td><a href="">Anchorage, AK</a><br />
<a href="">Nondalton, AK</a><br />
</td>
<td> </td><td><a href="">
                    View Details
                  </a></td></tr>

Why i can't get my $location[i]?

  • 写回答

2条回答 默认 最新

  • duanpingzu7194 2013-09-07 09:45
    关注

    How about this?

    <?php
    
    $contents = file_get_contents('http://....');
    
    $pattern =
    '@' . 
    '<td>\s*+' .
    '(?P<no>\d+)\.\s*+' .
    '</td>\s*+' .
    '<td>\s*+' .
    '<a class="LN" href="[^"]*+" onclick="[^"]*+">\s*+' .
    '<b>(?P<name>[^<]*+)</b>\s*+' .
    '</a>\s*+' .
    '</td>\s*+' .
    '<td align="center">[^<]*+</td>\s*+' .
    '<td>\s*+' .
    '(?P<locations>(?:<a href="[^"]*+">[^<]*+</a><br />\s*+)++)' .
    '</td>' .
    '@'
    ;
    
    $results = array();
    preg_match_all($pattern, $contents, $matches, PREG_SET_ORDER);
    foreach ($matches as $i => $match) {
        preg_match_all('@<a href="[^"]*+">([^<]*+)</a>@', $match['locations'], $locations);
        $results[$i]['no'] = $match['no'];
        $results[$i]['name'] = $match['name'];
        $results[$i]['locations'] = $locations[1];
    }
    
    echo '<pre>';
    print_r($results);
    echo '</pre>';
    

    This will work perfectly.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python
  • ¥15 要给毕业设计添加扫码登录的功能!!有偿
  • ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥15 stable diffusion
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘