dongzhan2029 2014-10-12 00:15
浏览 159
已采纳

当有空格时,preg_match_all正则表达式失败

I'm trying to get the image urls from html source code using the following regex, but it fails when the image url has spaces in it. For example this url:

<img src="http://a57.foxnews.com/global.fncstatic.com/static/managed/img/Entertainment/876/493/kazantsev pink bikini reuters.jpg?ve=1&amp;tl=1" alt="kazantsev pink bikini reuters.jpg" itemprop="image">

$image_regex_src_url = '/<img[^>]*'.'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex_src_url, $string, $out, PREG_PATTERN_ORDER);

This gives me back the following.
http://a57.foxnews.com/global.fncstatic.com/static/managed/img/Entertainment/876/493/kazantsev

Is there a way to match any character including whitespace? Or is it something I have to set in the php configuration?

  • 写回答

1条回答 默认 最新

  • donglu1973 2014-10-12 00:31
    关注

    You have several issues with your regular expression.

    First, you are trying to use the concatenation operator ('.') to join both parts of your expression together ( this is not necessary ). Secondly, you don't need to use the alternation operator | inside of your character classes.

    The dot . will match any character except newline sequence. It is a possibility that these tags could possibly include line breaks since they are located in HTML source. You could use the s (dotall) modifier which forces the dot to match any character including line breaks or use a negated character class meaning match any character except.

    Using the s (dotall) modifier:

    $image_regex_src_url = '/<img[^>]*src=(["\'])(.*?)\1/si';
    

    Using a negated character class [^ ]

    $image_regex_src_url = '/<img[^>]*src=(["\'])([^"\']*)\1/i';
    

    Although, it is much easier to use a parser such as DOM to grab the results.

    $doc = new DOMDocument;
    @$doc->loadHTML($html); // load the HTML
    
    foreach($doc->getElementsByTagName('img') as $node) {
       $urls[] = $node->getAttribute('src');
    }
    
    print_r($urls);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 nslt的可用模型,或者其他可以进行推理的现有模型
  • ¥15 arduino上连sim900a实现连接mqtt服务器
  • ¥15 vncviewer7.0安装后如何正确注册License许可证,激活使用
  • ¥15 phython如何实现以下功能?查找同一用户名的消费金额合并2
  • ¥66 关于人体营养与饮食规划的线性规划模型
  • ¥15 基于深度学习的快递面单识别系统
  • ¥15 Multisim仿真设计地铁到站提醒电路
  • ¥15 怎么用一个500W电源给5台60W的电脑供电
  • ¥15 请推荐一个轻量级规则引擎,配合流程引擎使用,规则引擎负责判断出符合规则的流程引擎模板id
  • ¥15 Excel表只有年月怎么计算年龄