du4373 2013-01-15 22:49
浏览 56

PHP正则表达式解析DOM并获取URL [重复]

Possible Duplicate:
Grabbing the href attribute of an A element

i have a problem with a regular expression, this regex works perfectly and its parse most of the links however im faced with problem that when its parse urls that have JavaScript its breaks for example if the HTML content have this href:

<a href="javascript:fixIt('yes')">anchor text</a>

it wil not parse the url correctly, instead it will parse half the url and output "javascript:fixIt('" so i tried to make skip URLs that start with "javascript:" but its not working correctly. and im at loss i have been on this for almost 4hrs now

this is my regex that im working with

/[\s]+(src|href|url|location|background|action)[\s]*=[\s]*([\'\"\`])?[\s]*([^\'\"\`\s>]+)([\'\"\`>])?/i

and here is a test sample:

<?php
$html = '<html><head><title>test</title></head><body><a href="http://www.example.com/">works</a>, <a href="javascript:dothis(\'ok\');">breaks</a></body></html>';
$pattren = '/[\s]+(src|href|url|location|background|action)[\s]*=[\s]*([\'\"\`])?[\s]*([^\'\"\`\s>]+)([\'\"\`>])?/i';
preg_replace_callback($pattren, function($r) { var_dump($r); }, $html);
?>

Thanks.

  • 写回答

1条回答 默认 最新

  • dongxiaoguang9108 2013-01-15 22:55
    关注
    /[\s]+(src|href|url|location|background|action)[\s]*=[\s]*([\'\"\`])?[\s]*(?!(?:#|javascript\s*:))([^"\']+)([\'\"\`>])?/i
    
    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题