前250名imdb详细php抓取器[关闭]

I'm trying to build a personal movie database and i want the data to be fetched from imdb ... Yes i know there are plenty api and grabber out there but none of them is doing what is need,,,

So far i couldn't come up with a solution to parse http://www.imdb.com/chart/top list and get my data from it...

I've tried to do it by a curl script but no luck !

For e.g:

I want to know if The Godfather: Part II is in top 250 ?if yes what is the rank...

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dph6308 2013-10-25 23:33
关注
API

I would look into whether or not IMDB have an API available... If they do this will likely be as simple as querying a URL and parsing the data returned with json_decode...

No API available?

Get the webpage

No need to use CURL a simple file_get_contents will do the trick...

Extract the list

Now you have the web page you then have two options:

Parse the web page with a DOM parser (long winded, not necessary)

Regex to extract the info you're after (simple, short)

Regex

A quick look at the source code of the list shows the list is in the format:

<td class="titleColumn">RANK. <a href="/link/to/film" title="Director/Leads" >FILM TITLE</a>

See CAPS for required information

Now converting this into a regex is simple; just remove the noise and replace with (non-greedy) wild cards...

<td class="titleColumn">RANK. <a.*?>FILM TITLE</a>

Add your capture groups:

<td class="titleColumn">(RANK). <a.*?>(FILM TITLE)</a>

and that's it...

#<td class="titleColumn">(\d+)\. <a.*?>(.*?)</a>#

Example

Using this in practice:

$page = file_get_contents("http://www.imdb.com/chart/top"); //Download the page preg_match_all('#<td class="titleColumn">(\d+)\. <a.*?>(.*?)</a>#', $page, $matches); //Match ranks and titles $top250 = array_combine($matches[1], $matches[2]); //Final array in format RANK=>TITLE

Then you can do something like:

echo $top250[1]; /** Output: The Shawshank Redemption */ echo array_search("The Godfather", $top250); /** Output: 2 */

You can then use standard PHP array functions to do things like search for films.

http://php.net/file_get_contents
http://php.net/preg_match_all
http://php.net/array_combine
http://php.net/array_search

Side note

Especially if you use the No API method above you might like to think about storing the results locally and only updating every X Hours/Days/Weeks to save load times etc. I assume that you are already planning on doing this (as you said you wanted a personal movie data base... But just thought I'd mention it anyway!
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

前250名imdb详细php抓取器[关闭] mysql php
2013-10-25 22:43

回答 1 已采纳 API I would look into whether or not IMDB have an API available... If they do this will likely be
如何使用PHP中的Ajax从网站（例如IMDB）获取数据 ajax javascript jquery php
2015-04-02 12:11

回答 1 已采纳 Your file needs to be .php, not .html as you are using PHP in your code. Don't worry though, HTML
php中的子串js javascript php
2017-11-20 20:43

回答 3 已采纳 Based off my comment here, the following code will just give you the ID: $id = explode("/", "www.
人工智能大数据,公开的海量数据集下载
2019-09-19 09:37

人在^O^旅途的博客数据集的网站： 1、Public Data Sets on Amazon Web Services (AWS) ...Amazon从2008年开始就为开发者提供几十TB的开发数据。 2、Yahoo! Webscope http://webscope.sandbox.yahoo.com/index.php 3、Konect is a...
如何从IMDB获得一年的电影或电视节目[关闭] html php
2015-04-22 15:47

回答 1 已采纳 The OMDB ABI might be of help in this instance. All you would need to do is send an HTTP request (
PHP - 从数组中的键获取动态值并放入列表中 php
2014-01-19 19:53

回答 3 已采纳 With my solution, you will not have a , after the last element : $actors= array(); foreac
试图用php发布多维数组 php
2019-03-05 12:40

回答 4 已采纳 Try this: if (isset($_POST['submit'])) { $valgt = $_POST['Filmarkiv']; echo "<section
大数据系统和分析技术综述【程学旗】
2016-02-26 12:57

weixin_30362083的博客他指出由处理器、链接器、传感器、执行器以及运行在其上的经济活动形成了人们熟知的物理经济(第一经济)之外的第二经济(不是虚拟经济).第二经济的本质是为第一经济附着一个“神经层”,使国民经济活动能够变得智能化,...
如何通过javaScript / jquery从php获取json返回值？ javascript jquery json php
2017-12-26 12:51

回答 1 已采纳 add ."json"... like: $(document).ready(function() { $('#IMDB').click(function() { var Movie
它们如何代表普通电影对imdb，烂番茄的评级？ css html javascript node.js php
2017-04-20 18:54

回答 1 已采纳 It's actually much simpler than it seems. Imagine two elements, stacked on top of one another: A
使用相同的数字键合并两个PHP数组 php
2016-04-23 06:36

回答 4 已采纳 You can try below code to merge array. Code generates desired output required to you. I have used
收藏 | 90+深度学习开源数据集整理：包括目标检测、工业缺陷、图像分割等多个方向（附下载）...
2022-05-02 00:00

3Ｄ视觉工坊的博客 iSAID 的显着特征如下：(a) 大量具有高空间分辨率的图像，(b) 十五个重要且常见的类别，(c) 每个类别的大量实例，(d) 每个类别的大量标记实例图像，这可能有助于学习上下文信息，(e) 巨大的对象尺度变化，通常在同一...
在PHP循环中的JQuery Post工作并不完美 html javascript jquery php
2017-08-17 18:10

回答 1 已采纳 jQuery.post is an asynchronous function, i.e. the request will not have been sent once this functi
90+深度学习开源数据集整理｜包括目标检测、工业缺陷、图像分割等多个方向...
2022-04-06 08:23

3Ｄ视觉工坊的博客 iSAID 的显着特征如下：(a) 大量具有高空间分辨率的图像，(b) 十五个重要且常见的类别，(c) 每个类别的大量实例，(d) 每个类别的大量标记实例图像，这可能有助于学习上下文信息，(e) 巨大的对象尺度变化，通常在同一...
python面试基础题_Python面试经典问题50问及答案
2020-11-20 23:24

weixin_39807954的博客 Python在许多领域中被使用 - 例如：Web应用程序，自动化，科学建模，大数据应用程序等等。它也经常被用作胶水“glue”代码，以使其他语言和组件发挥得很好。 Q3。深拷贝和浅拷贝有什么区别？答：浅拷贝在创建新...
【16】进大厂必须掌握的面试题-100个python面试
2020-10-30 16:30

Java架构师社区的博客这意味着，与C及其变种之类的语言不同，Python无需在运行前进行编译。其他解释语言包括PHP和Ruby。 Python是动态类型的，这意味着您在声明变量或类似的东西时不需要声明变量的类型。您可以做类似的事情x=111，然后x=...
收藏 | 超全开源数据集，你真的不想要吗？（附链接）
2018-09-19 07:37

数据分析v的博客样本大小从 120K 至 3.6M 不等，范围从二进制到 14 个分类问题。数据集来自 DBPedia、亚马逊、Yelp、Yahoo！和 AG。地址：...
go语言教程哪里有？go 语言优秀开源项目汇总
2019-10-08 06:43

dfywlme7915的博客 大数据 微服务 CI/CD 数据库技术存储技术分布式系统消息系统服务器管理安全工具网络工具 Web工具 Web框架 ...
各领域、各类型数据集汇总
2019-10-09 11:15

LoveMIss-Y的博客转自：人工智能与大数据技术skymind.ai网站上有一份十分全面的开源数据集，涵盖自然图像数据集、面部数据集等多个领域，为方面大家找到自己需要的数据集，将skymind...
没有解决我的问题, 去提问

悬赏问题

¥15 win11家庭中文版安装docker遇到Hyper-V启用失败解决办法整理
¥15 gradio的web端页面格式不对的问题
¥15 求大家看看Nonce如何配置
¥15 Matlab怎么求解含参的二重积分？
¥15 苹果手机突然连不上wifi了？
¥15 cgictest.cgi文件无法访问
¥20 删除和修改功能无法调用
¥15 kafka topic 所有分副本数修改
¥15 小程序中fit格式等运动数据文件怎样实现可视化？（包含心率信息））
¥15 如何利用mmdetection3d中的get_flops.py文件计算fcos3d方法的flops？

前250名imdb详细php抓取器[关闭]

1条回答 默认 最新

API

No API available?

Get the webpage

Extract the list

Regex

Example

Side note

悬赏问题

1条回答默认最新