doujiang7258 2012-12-04 00:13
浏览 4
已采纳

链接的Preg匹配错误

i am trying to get some preg match done .

i have basically come up with this

preg_match_all('<a href="(.*?)">', $page, $result);

but the output of this is

Array
(
    [0] => Array
     (
        [0] => a href="/stuff"
        [1] => a href="/stuffstuffstuff"

         and much more of this.

i want to remove the a href and the slashes and quotes and keep only the content. ive tried a lot but those things keep coming back , any help would be appretiated.

Thanks guys

  • 写回答

1条回答 默认 最新

  • donglinyi4313 2012-12-04 00:47
    关注

    First thing, please do NOT try to parse random html with regex, it is not going to work, it's going to break, sooner or later. Regex is not tool for parsing html, it CanNOT parse it correctly. 3 simple examples:

    <a href='stuff'> (different quotes)
    <!-- <a href="stuff">-->
    <a style='something' href="stuff">
    

    theese are going to break your application. There is infinite amount of other examples, which will not work and are gonna break it! Not even Chuck Norris can parse html with regex correctly, NOONE can!

    But I assume you already know that, and this is just small simple limited amount of known html, which isn't going to be released in public, so lets back to your question:

    preg_match_all expects the regex with delimiting characters and it matches all that stuff you write between them. If you write

    '<a href="(.*?)">' 
    

    as a regex, it treats the '<' at the begining as a delimiting character, thus not matching it. Write slashes (or any other characters) arround it:

    preg_match_all('/<a href="(.*?)">/', $page, $result);
    

    Now, it's going to match like:

    [0] => <a href="/stuff">
    

    But you want only the '/stuff'. $result gives you an array. In $result[0] is all the regex matched, in $result[1] is first () matched, in $result[2] would be second ( ) sub-expression matched, and so on... So, you want to look in $result[1], you should find what you want there.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 微信公众号如何开发网页
  • ¥15 h3.6m 人类行为预测论文复现
  • ¥50 wordpress项目注册报失败刷新后其实是成功状态,请求排查原因
  • ¥20 linxu服务器僵尸进程不释放,代码如何修改?
  • ¥15 pycharm激活不成功
  • ¥40 如果update 一个列名为参数的value
  • ¥15 基于51单片机的水位检测系统设计中LCD1602一直不显示
  • ¥15 OCS2安装出现问题,请大家给点意见
  • ¥15 ros小车启动launch文件报错
  • ¥15 vs2015到期想登陆但是登陆不上