duanniu3385
duanniu3385
2013-07-24 06:01
浏览 107
已采纳

如何用正则表达式解析html标签?

I wanted to parse following html tags contents retrieved through curl by regular expressions.

<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>

so that output will be "IND - 203/9 (49.4 Ovs)".

I have written following code but it is not working.please help.

$one="<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>";
$five="~(?<=<span class='ui-allscores'>)[.]*(?=</br></span>)~";
preg_match_all($five,$one,$ui);
print_r($ui);
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

3条回答 默认 最新

  • dongxie548548
    dongxie548548 2013-07-24 06:07
    已采纳

    Try this one:

    $string = "<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>";
    

    Dynamic span tag:

    preg_match('/<span[^>]*>(.*?)<\/span>/si', $string, $matches);
    

    Specific span tag:

    preg_match("/<span class='ui-allscores'>(.*?)<\/span>/si", $string, $matches);
    
    // Output
    array (size=2)
      0 => string '<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>' (length=56)
      1 => string 'IND - 203/9 (49.4 Ovs)' (length=22)
    
    点赞 评论
  • donxbje866688
    donxbje866688 2013-07-24 06:09

    If you simply want to remove the HTML tags, Use the php built-in function strip_tags to remove the html tags.

    Another answer on removing html tags Strip all HTML tags, except allowed

    点赞 评论
  • duangaoe9401
    duangaoe9401 2013-07-24 06:12

    The problem of your regex is the [.] part. This is matching only a literal ., because the dot is written inside a character class. So just remove the square brackets.

     $five="~(?<=<span class='ui-allscores'>).*(?=</br></span>)~";
    

    The next problem then is the greediness of *. You can change this matching behaviour by putting a ? behind.

    $five="~(?<=<span class='ui-allscores'>).*?(?=</br></span>)~";
    

    But the overall point is: You should most probably use a html parser for this job!

    See How do you parse and process HTML/XML in PHP?

    点赞 评论

相关推荐