dongmi6102 2011-01-06 22:34
浏览 32
已采纳

PHP-如何在HTML文档中搜索并在php中提取某些字符串?

I have an html document that I saved as a .txt file. I want to extract each string following /user/ and make a comma-separated list of all the extracted strings. So every time there's a "/user/boy34" in this txt file, I would like to extract the "boy34" part. Im really new to PHP but I've been reading about the preg_match_all() function and I think that's what I need to use.

Here's what I've come up so far but it doesn't work:

<?php
$str = file_get_contents("comment.txt");
preg_match_all ('/^(user\/)\/[A-Z0-9][A-Z0-9_-]+\"$/i', $str, $preg);
print_r ($preg);
?>

The output I get from this is:

Array ( [0] => Array ( ) [1] => Array ( ) ) 

Can somebody please help me?

  • 写回答

2条回答 默认 最新

  • douzen3516 2011-01-06 22:42
    关注

    Using ^ in a regex means that it will only match if the entire line begins with your subject. Also, the $ at the end means the line must also end right after the match. So you will never find anything, unless the entire line is nothing but /user/boy34. Also, you probably need the m flag for multiline mode.

    You should also use the shortcuts, like \w (word characters, A-Za-z0-9_)

    Try out this regex pattern: /"\/user\/(\w+)"/im

    If you post an example of your HTML, I can actually test this out and get you a working regex pattern.

    --- UPDATE ---

    I tested using this HTML:

    <html>
      <body>
        <a href="/user/boy30" />
        <a href="/user/boy31" />
        <a href="/user/boy32" />
      </body>
    </html>
    

    and the regex mentioned above, and I got it to work in this very simple test. I used this site to test: http://www.spaweditor.com/scripts/regex/index.php

    Here were my results:

    Array
    (
        [0] => Array
            (
                [0] => "/user/boy30"
                [1] => "/user/boy31"
                [2] => "/user/boy32"
            )
    
        [1] => Array
            (
                [0] => boy30
                [1] => boy31
                [2] => boy32
            )
    
    )
    

    --- Regex Explanation ---

    • / Required to start any regex pattern
    • " Looks for a double-quote character
    • \/user\/ Searches for /user/ (the forward-slashes needed to be escaped)
    • ( Anything between parenthesis will be grouped together in your results (leaving the parenthesis out will not break the regex, it will still find the matches, but this allows us to extract "boy32" up front.)
      • \w+ Searches for 1 or more (+ means "1 or more") word characters (equivalent to [a-zA-Z0-9_])
      • ) Ends the grouping started before
    • " Looks for another double-quote character
    • / Required at the end of any regex pattern, and before any flags
      • i Flag: Case-Insensitive Mode
      • m Flag: Multi-Line Mode (normally, line-breaks will terminate expressions, this allows the pattern to match even over multiple lines)
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 stm32代码移植没反应
  • ¥15 matlab基于pde算法图像修复,为什么只能对示例图像有效
  • ¥100 连续两帧图像高速减法
  • ¥15 组策略中的计算机配置策略无法下发
  • ¥15 如何绘制动力学系统的相图
  • ¥15 对接wps接口实现获取元数据
  • ¥20 给自己本科IT专业毕业的妹m找个实习工作
  • ¥15 用友U8:向一个无法连接的网络尝试了一个套接字操作,如何解决?
  • ¥30 我的代码按理说完成了模型的搭建、训练、验证测试等工作(标签-网络|关键词-变化检测)
  • ¥50 mac mini外接显示器 画质字体模糊