Possible Duplicate:
Grabbing the href attribute of an A element
i have a problem with a regular expression, this regex works perfectly and its parse most of the links however im faced with problem that when its parse urls that have JavaScript its breaks for example if the HTML content have this href:
<a href="javascript:fixIt('yes')">anchor text</a>
it wil not parse the url correctly, instead it will parse half the url and output "javascript:fixIt('" so i tried to make skip URLs that start with "javascript:" but its not working correctly. and im at loss i have been on this for almost 4hrs now
this is my regex that im working with
/[\s]+(src|href|url|location|background|action)[\s]*=[\s]*([\'\"\`])?[\s]*([^\'\"\`\s>]+)([\'\"\`>])?/i
and here is a test sample:
<?php
$html = '<html><head><title>test</title></head><body><a href="http://www.example.com/">works</a>, <a href="javascript:dothis(\'ok\');">breaks</a></body></html>';
$pattren = '/[\s]+(src|href|url|location|background|action)[\s]*=[\s]*([\'\"\`])?[\s]*([^\'\"\`\s>]+)([\'\"\`>])?/i';
preg_replace_callback($pattren, function($r) { var_dump($r); }, $html);
?>
Thanks.