dongwuwei0718 2010-11-06 02:39
浏览 36
已采纳

有一些正则表达式头痛与各种链接和href分隔符(“和')

So, I want to match the following link structures with a preg_match_all in php..

<a garbage href="http://this.is.a.link.com/?query=this has invalid spaces" possible garbage>
<a garbage href='http://this.is.a.link.com/?query=this also has has invalid spaces' possible garbage>
<a garbage href=http://this.is.a.link.com/?query=no_spaces_but_no_delimiters possible garbage>
<a garbage href=http://this.is.a.link.com/?query=no_spaces_but_no_delimiters>

I can get " and ' deilmited urls one by doing

'#<a[^>]*?href=("|\')(.*?)("|\')#is'

or I can get all 3, but not if there are spaces in the first two with:

'#<a[^>]*?href=("|\')?(.*?)[\s\"\'>]#is'

How can I formulate this so that it will pick up " and ' delimited with potential spaces, but also properly encoded URLs without delimiters.

  • 写回答

5条回答 默认 最新

  • dougou1127 2010-11-06 02:50
    关注

    OK, this seems to work:

    '#<a[^>]*?href=((["\'][^\'"]+["\'])|([^"\'\s>]+))#is'
    

    ($matches[1] contains the urls)

    Only annoyance is that quoted urls have the quotes still on, so you'll have to strip them off:

    $first = substr($match, 0, 1);
    if($first == '"' || $first == "'")
        $match = substr($match, 1, -1);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面
  • ¥15 itunes恢复数据最后一步发生错误
  • ¥15 关于#windows#的问题:2024年5月15日的win11更新后资源管理器没有地址栏了顶部的地址栏和文件搜索都消失了
  • ¥100 H5网页如何调用微信扫一扫功能?
  • ¥15 讲解电路图,付费求解
  • ¥15 有偿请教计算电磁学的问题涉及到空间中时域UTD和FDTD算法结合的
  • ¥15 three.js添加后处理以后模型锯齿化严重
  • ¥15 vite打包后,页面出现h.createElement is not a function,但本地运行正常