du4373 2013-01-15 22:49
浏览 56

PHP正则表达式解析DOM并获取URL [重复]

Possible Duplicate:
Grabbing the href attribute of an A element

i have a problem with a regular expression, this regex works perfectly and its parse most of the links however im faced with problem that when its parse urls that have JavaScript its breaks for example if the HTML content have this href:

<a href="javascript:fixIt('yes')">anchor text</a>

it wil not parse the url correctly, instead it will parse half the url and output "javascript:fixIt('" so i tried to make skip URLs that start with "javascript:" but its not working correctly. and im at loss i have been on this for almost 4hrs now

this is my regex that im working with

/[\s]+(src|href|url|location|background|action)[\s]*=[\s]*([\'\"\`])?[\s]*([^\'\"\`\s>]+)([\'\"\`>])?/i

and here is a test sample:

<?php
$html = '<html><head><title>test</title></head><body><a href="http://www.example.com/">works</a>, <a href="javascript:dothis(\'ok\');">breaks</a></body></html>';
$pattren = '/[\s]+(src|href|url|location|background|action)[\s]*=[\s]*([\'\"\`])?[\s]*([^\'\"\`\s>]+)([\'\"\`>])?/i';
preg_replace_callback($pattren, function($r) { var_dump($r); }, $html);
?>

Thanks.

  • 写回答

1条回答 默认 最新

  • dongxiaoguang9108 2013-01-15 22:55
    关注
    /[\s]+(src|href|url|location|background|action)[\s]*=[\s]*([\'\"\`])?[\s]*(?!(?:#|javascript\s*:))([^"\']+)([\'\"\`>])?/i
    
    评论

报告相同问题?

悬赏问题

  • ¥100 c语言,请帮蒟蒻看一个题
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)