What's wrong with this regex to exclude content of title tag?
$plaintext = preg_match('#<title>(.*?)</title>#', $html);
$html has html code of entire page.
What's wrong with this regex to exclude content of title tag?
$plaintext = preg_match('#<title>(.*?)</title>#', $html);
$html has html code of entire page.
It sounds like you never got a working answer. Let's remove the title tags.
Search: (?s)<title>.*?</title>
Replace: ""
Code:
$regex = "~(?s)<title>.*?</title>~";
$ replaced = preg_replace($regex,"",$pagecontent);
Explain Regex
(?s) # set flags for this block (with . matching
#
) (case-sensitive) (with ^ and $
# matching normally) (matching whitespace
# and # normally)
<title> # '<title>'
.*? # any character (0 or more times (matching
# the least amount possible))
</title> # '</title>'