dsplos5731 2011-10-30 04:18
浏览 14
已采纳

simple_html_dom库中的PHP正则表达式

I was trying to scrape imdb by following code.

$url = "http://www.imdb.com/search/title?languages=en|1&explore=year";
$html = new simple_html_dom();
$html->load(str_replace(' ','',$data = get_data($url)));

foreach($html->find('#left') as $total_movies)
{
$content = $total_movies->plaintext;
if(preg_match("/(?<total>[0-9,]+) titles/",$content,$matches))
{
    print_r($matches);
}
echo $content."<br>";
}

get_data() is just a curl function i created.

The problem is that preg_match is not working. i don't know why but the same thing when used work here. $content contains the text what i scrape in above code.

$content = "1-50 of 101 titles.";
if(preg_match("/(?<total>[0-9,]+) titles/",$content,$matches))
print_r($matches);
  • 写回答

1条回答 默认 最新

  • dsymx68408 2011-10-30 05:06
    关注

    The source on the site is actually:

    <div id="left">
    1-50 of 564,592
    titles.
    </div>
    

    notice the this would need stripping out or added to your condition.

    Heres a method to reach your goal without using any added extra library.

      <?php 
        $url = "http://www.imdb.com/search/title?languages=en|1&explore=year";
        $temp=file_get_contents($url);
    
        $xml = new DOMDocument();
        @$xml->loadHTML($temp);
    
        foreach($xml->getElementsByTagName('div') as $div) {
            if($div->getAttribute('id')=='left'){
                preg_match("#of ([0-9,]+)#",$div->nodeValue,$match);
                $matchs[]=preg_replace('/[^0-9]/', '', $match[0]);
            }
        }
    
        echo number_format($matchs[0]); //564,592
    
        ?>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab
  • ¥15 求lingo代码和思路
  • ¥15 公交车和无人机协同运输
  • ¥15 stm32代码移植没反应
  • ¥15 matlab基于pde算法图像修复,为什么只能对示例图像有效