dongzhan2029 2014-10-12 00:15
浏览 159
已采纳

当有空格时,preg_match_all正则表达式失败

I'm trying to get the image urls from html source code using the following regex, but it fails when the image url has spaces in it. For example this url:

<img src="http://a57.foxnews.com/global.fncstatic.com/static/managed/img/Entertainment/876/493/kazantsev pink bikini reuters.jpg?ve=1&amp;tl=1" alt="kazantsev pink bikini reuters.jpg" itemprop="image">

$image_regex_src_url = '/<img[^>]*'.'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex_src_url, $string, $out, PREG_PATTERN_ORDER);

This gives me back the following.
http://a57.foxnews.com/global.fncstatic.com/static/managed/img/Entertainment/876/493/kazantsev

Is there a way to match any character including whitespace? Or is it something I have to set in the php configuration?

  • 写回答

1条回答 默认 最新

  • donglu1973 2014-10-12 00:31
    关注

    You have several issues with your regular expression.

    First, you are trying to use the concatenation operator ('.') to join both parts of your expression together ( this is not necessary ). Secondly, you don't need to use the alternation operator | inside of your character classes.

    The dot . will match any character except newline sequence. It is a possibility that these tags could possibly include line breaks since they are located in HTML source. You could use the s (dotall) modifier which forces the dot to match any character including line breaks or use a negated character class meaning match any character except.

    Using the s (dotall) modifier:

    $image_regex_src_url = '/<img[^>]*src=(["\'])(.*?)\1/si';
    

    Using a negated character class [^ ]

    $image_regex_src_url = '/<img[^>]*src=(["\'])([^"\']*)\1/i';
    

    Although, it is much easier to use a parser such as DOM to grab the results.

    $doc = new DOMDocument;
    @$doc->loadHTML($html); // load the HTML
    
    foreach($doc->getElementsByTagName('img') as $node) {
       $urls[] = $node->getAttribute('src');
    }
    
    print_r($urls);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题