douzhang7728 2010-02-01 08:11
浏览 27
已采纳

正则表达式解析包含特定单词的链接

Taking this thread a step further, can someone tell me what the difference is between these two regular expressions? They both seem to accomplish the same thing: pulling a link out of html.

Expression 1:

'/(https?://)?(www.)?([a-zA-Z0-9_%]*)\b.[a-z]{2,4}(.[a-z]{2})?((/[a-zA-Z0-9_%])+)?(.[a-z])?/'

Expression 2:

'/<a.*?href\s*=\s*["\']([^"\']+)[^>]*>.*?<\/a>/si'

Which one would be better to use? And how could I modify one of those expressions to match only links that contain certain words, and to ignore any matches that do not contain those words?

Thanks.

  • 写回答

4条回答 默认 最新

  • douwen2158 2010-02-01 08:41
    关注

    The difference is that expression 1 looks for valid and full URIs, following the specification. So you get all full urls that are somewhere inside of the code. This is not really related to getting all links, because it doesn't match relative urls that are very often used, and it gets every url, not only the ones that are link targets.

    The second looks for a tags and gets the content of the href attribute. So this one will get you every link. Except for one error* in that expression, it is quite safe to use it and it will work good enough to get you every link – it checks for enough differences that can appear, such as whitespace or other attributes.

    *However there is one error in that expression, as it does not look for the closing quote of the href attribute, you should add that or you might match weird things:

    /<a.*?href\s*=\s*["\']([^"\'>]+)["\'][^>]*>.*?<\/a>/si
    

    edit in response to the comment:

    To look for word inside of the link url, use:

    /<a.*?href\s*=\s*["\']([^"\'>]*word[^"\'>]*)["\'][^>]*>.*?<\/a>/si
    

    To look for word inside of the link text, use:

    /<a.*?href\s*=\s*["\']([^"\'>]+)["\'][^>]*>.*?word.*?<\/a>/si
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥20 关于#stm32#的问题:需要指导自动酸碱滴定仪的原理图程序代码及仿真
  • ¥20 设计一款异域新娘的视频相亲软件需要哪些技术支持
  • ¥15 stata安慰剂检验作图但是真实值不出现在图上
  • ¥15 c程序不知道为什么得不到结果
  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因