用于从PHP中的html代码中提取图像URL的正则表达式[重复]

Possible Duplicates:
Grabbing the href attribute of an A element
Best methods to parse HTML

I have been using this code to extract images from HTML code in PHP:

$output = preg_match_all( '/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $content, $matches);
if ( $output > 0 ) echo $matches[1][0];

It has been working fine for me all the time but its misbehaving with a particular HTML code. I don't have good grip on regex so need help to figure this out.

Works for:

<p>
    I finally decided to try Pomodoro technique to see how well it can improve my productivity as I am a lot disorganised, lazy sorta geek (well who isn’t?). So I built up a small script which acts as a Pomodoro timer for me using <a href="http://blog.ashfame.com/2011/04/ubuntu-notification-system/">Ubuntu notification system</a> (Do read it if you haven’t, you need to install lib-notify package for this script to work).
</p>
<p>
    I have created a launcher in my top panel, with which I start a new <em>pomodori</em> (name for a new period of time, lets call it a Pomodoro anyway). It calls up the script which alerts me that a new Pomodoro (time period) has started and then alert me again when the timer ends and I should take a small break.
</p>
<p>
    Here is the script:
</p>
<pre class="brush: bash; title: ; toolbar: false;" title="">
 DISPLAY=:0 notify-send -t 1000 -i /home/ashfame/Dropbox/Ubuntu/icons/pomodoro.png "New Pomodoro starts" "You have 25 minutes to work."# 25 minutes timersleep 1500DISPLAY=:0 notify-send -t 1000 -i /home/ashfame/Dropbox/Ubuntu/icons/pomodoro.png "Pomodoro ends" "Take a break!"
</pre>
<p>
    As soon as I click the launcher, the first notification appears telling me that a new Pomodoro has started.
</p>
<p>
    <img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro-starts.png" alt="pomodoro starts">
</p>
<p>
    Then it sleeps for 1500 secs = 25 minutes. And after that the second notification appears telling me that the Pomodoro has ended.
</p>
<p>
    <img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro-ends.png" alt="pomodoro ends">
</p>
<p>
    I just take a 3-5 minutes break or even longer (I am the boss!), and then I again click on the launcher starting another Pomodoro and I work for another 25 minutes. You can use the same tomato icon, if you want.
</p>
<p>
    <img class="aligncenter" src="http://blog.ashfame.com/wp-content/uploads/2011/04/pomodoro.png" alt="pomodoro">
</p>
<p>
    Enjoy the awesomeness of Ubuntu and ditch Windows, yes I am an Ubuntu advocate and will push you to switch all the time <img src='http://blog.ashfame.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley'>
</p>

Doesn't Work for :

<p>
    <img style="margin: 0px 10px 5px 0px" src="http://ijew.com.br/wp-content/uploads/HLIC/5b8b8f82bd69fd4a78aa114fd91bd9b5.jpg" width="300" height="226">
</p>
<p>
    Hey ijews! Pessach é inesquecível! E quem pode esquecer comendo 8 dias matzá?!
</p>
<p>
    Produção caseira muito bem feita.
</p><!--more-->
<p>
    &nbsp;
</p>
<p>
    <iframe title="YouTube video player" width="480" height="390" src="http://www.youtube.com/embed/d3D6O_sBOlc?rel=0" frameborder="0" allowfullscreen=""></iframe>
</p>
douxin0251
douxin0251 嗯..谢谢:)
9 年多之前 回复
douying1119
douying1119 tbh,我很少使用正则表达式。但他们确实有自己的用途。想到输入蜇的验证/过滤。当然,您可以使用filter_var(我更喜欢),但filter_var并不能解决您可能用于验证或过滤的每个UseCase,因此,为了方便起见,它提供了一个正则表达式回调。
9 年多之前 回复
doudou3716
doudou3716 早期的链接是相同的,我现在可以看到它。在一方面,如果不使用正则表达式进行html解析,那么它们还有其他实际用途吗?
9 年多之前 回复
dongxian4531
dongxian4531 谢谢你的链接,我不知道这是废话。我肯定会在将来使用它们,但我觉得正则表达式会更快,但你提到“正则表达式引擎”,这使它听起来很沉重。但真正的检查将是分析一个测试用例。你做完了吗?我可以接受你的话,但我没有任何东西可以捍卫我的观点。非常感谢:)
9 年多之前 回复
douxu5845
douxu5845 为什么会更好?仅仅因为它只有一行代码?这没有说明匹配该字符串所需的时间。还必须加载正则表达式引擎。但谁在乎它是否需要25毫秒或50毫秒?它仍然只是微秒。而你花了一个小时来修复正则表达式。就像我已经说过的那样,如果你从一开始就使用了正确的解析器,你就不会遇到这个问题。在为您的应用程序添加新的业务价值方面,这段时间可能会更好。在旁注中,SimpleHTMLDom是废话。使用任何stackoverflow.com/questions/3577641
9 年多之前 回复
dqys98341
dqys98341 我同意,但正则表达式肯定比在每个页面加载时为单个图像加载HTML解析器更好。毕竟它只是找到一个简单的图像网址。当需求广泛且需求时,我自己使用简单的HTML解析器。
9 年多之前 回复
douxi2670
douxi2670 对不起,但那是胡说八道。在得出错误的结论之前,查看您的代码以确定是否存在任何重大开销。还要考虑一下,如果你从一开始就使用过DOM,你就不会遇到这个问题。解决方案的可靠性有利于在不需要它们的情况下减少几微秒。
9 年多之前 回复
doune1000
doune1000 是的,我得到了错误的比赛,但问题现在已经解决了。Levu指出了所需的改变。:)
9 年多之前 回复
dongyue110702
dongyue110702 嗨,我刚用你提到的<img/>声明测试了你的RegExt,它确实对src进行了分组。你确定你没有得到错误的$match[1]吗?
9 年多之前 回复

1个回答

Turn <img.+src to either <img.+?src (lazy mode) or - even better - to <img[^>]+src.

duanfu3634
duanfu3634 谢谢! 这样做了:)
9 年多之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问