dongmi6102 2011-01-06 22:34
浏览 32
已采纳

PHP-如何在HTML文档中搜索并在php中提取某些字符串?

I have an html document that I saved as a .txt file. I want to extract each string following /user/ and make a comma-separated list of all the extracted strings. So every time there's a "/user/boy34" in this txt file, I would like to extract the "boy34" part. Im really new to PHP but I've been reading about the preg_match_all() function and I think that's what I need to use.

Here's what I've come up so far but it doesn't work:

<?php
$str = file_get_contents("comment.txt");
preg_match_all ('/^(user\/)\/[A-Z0-9][A-Z0-9_-]+\"$/i', $str, $preg);
print_r ($preg);
?>

The output I get from this is:

Array ( [0] => Array ( ) [1] => Array ( ) ) 

Can somebody please help me?

  • 写回答

2条回答 默认 最新

  • douzen3516 2011-01-06 22:42
    关注

    Using ^ in a regex means that it will only match if the entire line begins with your subject. Also, the $ at the end means the line must also end right after the match. So you will never find anything, unless the entire line is nothing but /user/boy34. Also, you probably need the m flag for multiline mode.

    You should also use the shortcuts, like \w (word characters, A-Za-z0-9_)

    Try out this regex pattern: /"\/user\/(\w+)"/im

    If you post an example of your HTML, I can actually test this out and get you a working regex pattern.

    --- UPDATE ---

    I tested using this HTML:

    <html>
      <body>
        <a href="/user/boy30" />
        <a href="/user/boy31" />
        <a href="/user/boy32" />
      </body>
    </html>
    

    and the regex mentioned above, and I got it to work in this very simple test. I used this site to test: http://www.spaweditor.com/scripts/regex/index.php

    Here were my results:

    Array
    (
        [0] => Array
            (
                [0] => "/user/boy30"
                [1] => "/user/boy31"
                [2] => "/user/boy32"
            )
    
        [1] => Array
            (
                [0] => boy30
                [1] => boy31
                [2] => boy32
            )
    
    )
    

    --- Regex Explanation ---

    • / Required to start any regex pattern
    • " Looks for a double-quote character
    • \/user\/ Searches for /user/ (the forward-slashes needed to be escaped)
    • ( Anything between parenthesis will be grouped together in your results (leaving the parenthesis out will not break the regex, it will still find the matches, but this allows us to extract "boy32" up front.)
      • \w+ Searches for 1 or more (+ means "1 or more") word characters (equivalent to [a-zA-Z0-9_])
      • ) Ends the grouping started before
    • " Looks for another double-quote character
    • / Required at the end of any regex pattern, and before any flags
      • i Flag: Case-Insensitive Mode
      • m Flag: Multi-Line Mode (normally, line-breaks will terminate expressions, this allows the pattern to match even over multiple lines)
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 微信小程序协议怎么写
  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看