PHP - Scrape url（获取元og：，元链接或图像）

I'm investigating about how to scrape a url in the "best and most recent way". I intend to retrieve one image from a url. First from a link tag <link rel="image_src" href="http://stackoverflow.com/images/logo.gif" />, then from an og tag... and maybe, if I still got nothing, try to get the first big enough img. Put differently, a light version of facebook on thumbnail-retrieving.

So I'm reading stuff on the internet, and when I thought I had found what I need it appeared the solution was pretty old (like 5-6y old http://www.lightspeedretail.com/cloud/blog/2007/08/scraping-links-with-php/) : solution using cURL, DOMDocument, and XPath basically. Then I would just have to work on the image url I got, store a few versions of it in different sizes for instance. But I'm fine for this part.

Would there be something better than this solution ? Ideally an example for the link tag would be fantastic.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

报告相同问题？

关注问题

PHP CURL - 当你只知道id时刮掉seo url php
2018-08-10 10:52

回答 2 已采纳 Curl provides the option CURLOPT_FOLLOWLOCATION. curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true)
使用PHP从url.jsonp获取文本 php
2017-09-16 18:47

回答 1 已采纳 The response is gzip'd. You can see it in the response headers: Content-Encoding: gzip So, you
PHP简单的HTML DOM Scrape外部URL php
2013-12-09 16:03

回答 2 已采纳 I think, you may want something like this $url = 'http://www.peopleperhour.com/freelance-seo-jobs
hs-scrape-paypal-login:使用 hs-scrape 登录 paypal 的示例-源码
2021-07-04 05:21

hs-scrape-paypal-登录 git clone ...
PHP正则表达式 - 使用指定类从所有链接获取文本[重复] php
2012-09-06 23:04

回答 1 已采纳 <?php $dom = new domDocument; $dom->loadHTML($html); $dom->preserveWhiteSpac
如何使用带有Scrapy的admin-ajax.php从网站上抓取数据 ajax php python
2018-07-11 12:56

回答 1 已采纳 I finally found how to do so, I am sure this is not the best way but at least I did what I wanted
通过AngularJS加载的刮刮网页 - CURL PHP php
2016-09-30 10:26

回答 1 已采纳 Question is Downvoted? Anyways I completed my task using Phantom. Thanks for your answers
website-scrape-and-deploy:抓取网站并部署到Amazon S3以生成无服务器网站
2021-05-16 11:25

执行首先抓取所有页面： scrapy crawl web -a root_url=https://www.data-blogger.com/ -a output_path=/media/sf_Ubuntu/website-scrape-and-deploy/output/ -a exclude=/oembed/在这里，我指定了根URL。...
PHP cURL - 为什么脚本在第36次请求后死于远程URL？ php
2014-05-22 16:51

回答 1 已采纳 maybe its just memory limit problem try this(at the top of script). ini_set("memory_limit",-1);
用PHP刮取页面 php
2019-01-08 10:14

回答 1 已采纳 A very quick look at the page https://www.soccerstats.com/matches.asp showed that what the "cookie
刮动态内容PHP php
2019-07-01 14:04

回答 1 已采纳 When you click the Info button new ajax call occurs. You can see that in dev tools (inspect) -&gt
not-safe-to-scrape:NSFW Web刮板
2021-05-10 12:08

它能够从众多网站上抓取标题，描述标签和多个图像，并提供小型Web服务（或不和谐的bot）来抓取任何网站的内容。支持的网站丹布鲁南泰 XYZ * 皮克斯规则34 SVSComics 其他一切！* 通用支持刮板会尽力刮除...
如何限制php dom解析结果 php
2015-11-08 14:14

回答 1 已采纳 Use a variable to keep counting the results, and break from the for loop as soon as the count is 2
image-scrape:一个简单的图像抓取器，用于获取任何提供的 URL 中最大图像的 URL
2021-06-22 23:06

要获取任何 URL 上最大图像的来源： $scraper->getLargestImageUrl($url);脚本首先发出 head 请求。如果 'imageLinksOnly' 参数设置为 true，如果响应不包含 'Content-Type' 标头，或者该标头不是图像类型，则返回...
wow-scrape-addon-download-count:减少CurseForge和WowInterface的插件下载数量
2021-05-25 05:51

npm install -g wow-scrape-addon-download-count （或Windows上的wow-scrape-addon-download-count.cmd ）以查看用法和示例命令行。输出是什么样的？ $ wow-scrape-addon-download-count.cmd \ -n GoldCounter \ -...
All-the-news-that-s-fit-to-scrape:班级抄作业
2021-05-29 21:37

适合刮刮的所有新闻班级抄作业
google-image-scrape-0.1:尝试使用以下方法从Google图片中下载{max}张图片
2021-04-17 08:24

google-image-scrape 尝试通过{searchterm}从Google图片中下载{max}张图片，最多到{savedir}文件夹，并使用{delay}遍历图片仅支持.jpg，.png和.jpeg，如果您不包括{savedir}，则它将默认为用户图片文件夹中名为{...
ball-python-scrape：球形python scrape项目的完整代码
2021-02-19 07:49

球Python刮Ball Python Scrap项目的完整代码网站
google-covid-mobility-scrape：用于抓取Google的COVID19社区机动性报告的脚本[ARCHIVED]
2021-02-04 15:53

google-covid-mobility-scrape：用于抓取Google的COVID19社区机动性报告的脚本[ARCHIVED]
elixir-scrape：轻松抓取任何网站，文章或RSSAtom Feed！
2021-02-03 19:31

elixir-scrape：轻松抓取任何网站，文章或RSSAtom Feed！
没有解决我的问题, 去提问

悬赏问题

¥15 在获取boss直聘的聊天的时候只能获取到前40条聊天数据
¥20 关于URL获取的参数，无法执行二选一查询
¥15 液位控制，当液位超过高限时常开触点59闭合，直到液位低于低限时，断开
¥15 marlin编译错误，如何解决？
¥15 有偿四位数，节约算法和扫描算法
¥15 VUE项目怎么运行，系统打不开
¥50 pointpillars等目标检测算法怎么融合注意力机制
¥20 Vs code Mac系统 PHP Debug调试环境配置
¥60 大一项目课，微信小程序
¥15 求视频摘要youtube和ovp数据集

码龄粉丝数原力等级 --

PHP - Scrape url（获取元og：，元链接或图像）

0条回答默认最新

悬赏问题

PHP - Scrape url（获取元og：，元链接或图像）

0条回答 默认 最新

悬赏问题

0条回答默认最新