simple_html_dom库中的PHP正则表达式

I was trying to scrape imdb by following code.

$url = "http://www.imdb.com/search/title?languages=en|1&explore=year";
$html = new simple_html_dom();
$html->load(str_replace('&nbsp;','',$data = get_data($url)));

foreach($html->find('#left') as $total_movies)
{
$content = $total_movies->plaintext;
if(preg_match("/(?<total>[0-9,]+) titles/",$content,$matches))
{
    print_r($matches);
}
echo $content."<br>";
}

get_data() is just a curl function i created.

The problem is that preg_match is not working. i don't know why but the same thing when used work here. $content contains the text what i scrape in above code.

$content = "1-50 of 101 titles.";
if(preg_match("/(?<total>[0-9,]+) titles/",$content,$matches))
print_r($matches);

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dsymx68408 2011-10-30 05:06
关注
The source on the site is actually:

<div id="left"> 1-50 of 564,592 titles. </div>

notice the this would need stripping out or added to your condition.

Heres a method to reach your goal without using any added extra library.

<?php $url = "http://www.imdb.com/search/title?languages=en|1&explore=year"; $temp=file_get_contents($url); $xml = new DOMDocument(); @$xml->loadHTML($temp); foreach($xml->getElementsByTagName('div') as $div) { if($div->getAttribute('id')=='left'){ preg_match("#of ([0-9,]+)#",$div->nodeValue,$match); $matchs[]=preg_replace('/[^0-9]/', '', $match[0]); } } echo number_format($matchs[0]); //564,592 ?>
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

simple_html_dom库中的PHP正则表达式 php
2011-10-30 04:18

回答 1 已采纳 The source on the site is actually: <div id="left"> 1-50 of 564,592 titles. </div>
Wordpress simple_html_dom.php管理页面 php
2018-11-27 23:12

回答 2 已采纳 I was able to solve this by looking at file_get_contents(): stream does not support seeking ／ When
PHP simple_html_dom无法正确解析Apple维基百科页面 html php
2015-03-22 17:28

回答 1 已采纳 Change MAX_FILE_SIZE constant in simple_html_dom.php to, e.g. define('MAX_FILE_SIZE', 800000);
php解析html类库simple_html_dom
2019-04-24 18:45

夏已微凉、的博客 php解析html类库simple_html_dom 工具类下载地址：https://github.com/samacs/simple_html_dom 转载地址：https://blog.csdn.net/j_h_s/article/details/78457675 ...
在simple_html_dom中设置超时 php
2016-02-04 13:26

回答 1 已采纳 You can not do that with simple_html_dom() or file_get_contents() or any other 'pure' PHP. For th
为什么找不到div？（simple_html_dom） html php
2017-12-23 17:50

回答 2 已采纳 So the Solution: For some unkown reason I needed to find the div/tag I was searching for by count
如何使用PHP Simple HTML dom获取此文本？ html php
2015-10-30 23:20

回答 2 已采纳 Maybe this will give you the result you are looking for: foreach($info_html->find('div.info p'
巧用简单工具：PHP使用simple_html_dom库助你轻松爬取JD.com
2023-11-29 16:19

亿牛云爬虫专家的博客爬虫技术是一种从网页上自动提取数据的方法，它可以用于各种目的，比如数据分析、网站监控、...simple_html_dom是一个轻量级的HTML解析器，它可以方便地从HTML文档中提取元素和属性，而无需使用正则表达式或DOM操作。
如何在PHP中使用simple_html_dom导入多个URL？ php
2018-06-22 09:14

回答 2 已采纳 I got an answer. <?php if(!empty($_FILES["excel_file"])) { $connect = mysqli_connect("loc
simple_html_dom.php内存问题 php
2011-11-26 16:44

回答 2 已采纳 $html->clear; if this is your actual code then you may want to change it to function call: $h
使用带有ajax的simple_html_dom [重复] ajax html php
2015-02-11 13:32

回答 1 已采纳 try with this <?php require_once '../library/Simple_HTML_DOM/simple_html_dom.php'; // Create
php解析html类库simple_html_dom详细使用教程说明
2015-09-18 23:34

狂野小青年的博客下载地址：https://github.com/samacs/simple_html_dom 官网下载：http://sourceforge.net/projects/simplehtmldom/files/latest/download?source=top3_dlp_t5 一直以来使用php解析html文档树都是一个难题。Simple...
simple_html_dom访问div里面的ul php
2017-03-14 10:19

回答 1 已采纳 You have to select <ul> inside $element by using $dom = $dom->find($element.' ul', 0)-&g
php爬取网页数据（simple_html_dom）
2023-04-18 16:10

会搬砖的猿的博客 php解析html类库simple_html_dom(爬虫相关)
php页面采集正则,PHP simple_html_dom.php+正则采集文章代码
2021-03-24 11:08

叶提的博客 //包含PHP Simple html Dom 类库文件include_once('./simplehtmldom/simple_html_dom.php');//采集htmlfunction getwebcontent($url){$ch = curl_init();$timeout = 10;curl_setopt($ch, CURLOPT_URL, $url);curl_...
没有解决我的问题, 去提问

悬赏问题

¥20 神经网络Sequential name=sequential, built=False
¥16 Qphython 用xlrd读取excel报错
¥15 单片机学习顺序问题！！
¥15 ikuai客户端多拨vpn，重启总是有个别重拨不上
¥20 关于#anlogic#sdram#的问题，如何解决？(关键词-performance)
¥15 相敏解调 matlab
¥15 求lingo代码和思路
¥15 公交车和无人机协同运输
¥15 stm32代码移植没反应
¥15 matlab基于pde算法图像修复，为什么只能对示例图像有效

simple_html_dom库中的PHP正则表达式

1条回答 默认 最新

悬赏问题

1条回答默认最新