从HTML中抓取唯一的图片网址

Using PHP to curl a web page (some URL entered by user, let's assume it's valid). Example: http://www.youtube.com/watch?v=Hovbx6rvBaA

I need to parse the HTML and extract all de-duplicated URL's that seem like an image. Not just the ones in img src="" but any URL ending in jpe?g|bmp|gif|png, etc. on that page. (In other words, I don't wanna parse the DOM but wanna use RegEx).

I plan to then curl the URLs for their width and height information and ensure that they are indeed images, so don't worry about security related stuff.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doraemon0769 2010-08-19 06:07
关注
Collect all image urls into an array, then use array_unique() to remove duplicates.

$my_image_links = array_unique( $my_image_links ); // No more duplicates

If you really want to do this w/ a regex, then we can assume each image name will be surrounded by either ', ", or spaces, tabs, or line breaks or beginning of line, >, <, and whatever else you can think of. So, then we can do:

$pattern = '/[\'" >\t^]([^\'" \t]+\.(jpe?g|bmp|gif|png))[\'" < \t]/i'; preg_match_all($pattern, html_entity_decode($resultFromCurl), $matches); $imgs = array_unique($matches[1]);

The above will capture the image link in stuff like:

<p>Hai guys look at this ==> http://blah.com/lolcats.JPEG</p>

Live example
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用php从url中抓取图像 php
2017-04-24 22:39

回答 2 已采纳 If you want something generic, you can use: <?php $the_site = "http://somesite.com"; $
如何使用php简单的html dom或Curl从div中抓取HTML标签 php
2017-07-01 15:01

回答 4 已采纳 $str = <<<EOF <div class='room'> <h1>This is a h1</h1> <p>This is
ueditor抓取远程图片失败 javascript 百度
2021-01-11 10:46

回答 1 已采纳解决方法：对返回的URL进行替换 1 if (oldSrc == cj.source.replace(/&/ig, "&") && cj.state == "SUCCESS") {
php抓取图片保留本地,PHP抓取远程图片到本地保存
2021-04-08 09:42

weixin_39637179的博客 /*** PHP将网页上的图片攫取到本地存储* @param $imgUrl 图片url地址* @param string $saveDir 本地存储路径默认存储在当前路径* @param null $fileName 图片存储到本地的文件名* @return mix*/function crabImage...
怎么从html上正确抓取数据呀？ python
2023-03-26 16:21

回答 4 已采纳用xpath就够了，不需要parsel框架，多余了。
php curl 抓取taobao评价出现中文乱码 php 阿里云
2018-11-23 09:32

回答 1 已采纳是不是数据库传输的数据乱码？如果是的话在链接的时候加上charset=utf8
使用php从谷歌财务中抓取数据 php
2014-06-15 18:49

回答 1 已采纳 I would use a DOM parser and XPath to select the content of that span tag. Like this: $url = 'htt
php抓取远程的图片，远程图片名字包含空格和中文
2021-06-17 11:26

娃娃菜001的博客。的.jpg";...。的.jpg"));... 总结：urlencode和rawurlencode两个方法在处理字母数字，特殊符号，中文的时候结果都是一样的，唯一的不同是对空格的处理，urlencode处理成“+”，rawurlencode处理成“%20”。
怎么使用PHP抓取网页中没有的内容。。 php
2019-03-12 16:58

回答 1 已采纳需求说的明白点，想要干啥？抓取内容的连接返回的是json的数据
从Smarty中的数组中抓取某些值 php
2017-10-22 15:26

回答 2 已采纳 It's not like the way you did, but using {section} will do: <select class="form-control" id="s
在PHP中抓取部分URL [关闭] html php
2013-09-17 16:25

回答 1 已采纳 If this is the current URL being accessed on your site, Php provide globals and you can access it
php 抓取页面正则,如何使用PHP实现正则抓取页面中的网址
2021-03-18 17:46

鲍鱼王的博客从页面中抓取页面中所有的链接，当然使用PHP正则表达式是最方便的办法。要写出正则表达式，就要先总结出模式，那么页面中的链接会有几种形式呢？下面一起来看看。前言链接也就是超级链接，是从一个元素(文字、图片、...
如何使用php从facebook抓取一个关键字 facebook php
2014-10-04 07:07

回答 1 已采纳 For Twitter: https://dev.twitter.com/rest/reference/get/statuses/mentions_timeline For Facebook i
php获取html中的元素的值,JavaScript中获取HTML元素值的三种方法
2021-04-21 09:33

基督智慧的博客 JavaScript中取得元素的方法有三种：分别是：1、getElementById() 方法：通过id取得HTML元素。2、getElementsByName()方法：通过name取得元素，是一个数组。3、getElementsByTagName()方法：通过HTML标签取得元素，...
PHP通过file_get_contents实现简单图片抓取并批量下载
2023-12-06 09:58

太阳尚未升起，我仍黑夜中前行的博客一、文件抓取使用“file_get_contents”来抓取网页中的元素，再通过正则来筛选出图片src属性，上代码 public function index(Request $request){ if($request->ajax()){ // 获取网页内容 $html = file_get_contents...
没有解决我的问题, 去提问

悬赏问题

¥15 c语言怎么用printf（“\b \b”）与getch（）实现黑框里写入与删除？
¥20 怎么用dlib库的算法识别小麦病虫害
¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
¥15 java写代码遇到问题，求帮助
¥15 uniapp uview http 如何实现统一的请求异常信息提示？
¥15 有了解d3和topogram.js库的吗？有偿请教
¥100 任意维数的K均值聚类
¥15 stamps做sbas-insar，时序沉降图怎么画
¥15 买了个传感器，根据商家发的代码和步骤使用但是代码报错了不会改，有没有人可以看看
¥15 关于#Java#的问题，如何解决？

从HTML中抓取唯一的图片网址

2条回答 默认 最新

悬赏问题

2条回答默认最新