If you are trying to extract some data from an HTML document, you should not use regular expressions.
Instead, you should use a DOM Parser : those are made exactly for that.
In PHP, you would use the DOMDocument
class, and its DOMDocument::loadHTML()
method, to load the HTML content.
Then, you can work with methods such as :
You can even work with DOMXpath
to execute XPath queries on your HTML content -- which will allow you to search for pretty much anything in it.
In your case, I suppose that something like this should do the trick.
First, get your HTML content into a string (or use DOMDocument::loadHTMLFile()
) :
$html = <<<HTML
<p>hello</p>
<div>
<div id="MustBeInThisId">
<div class="ValueFromThisClass">
The Value I need
</div>
</div>
<div>
HTML;
Then, load it to a DOMDocument
instance :
$dom = new DOMDocument();
$dom->loadHTML($html);
Instanciate a DOMXPath
object, and use it to query your DOM object :
My XPath expression might be a bit more complex than necessary... I'm not really good with those...
$xpath = new DOMXPath($dom);
$items = $xpath->query('//div[@id="MustBeInThisId"]/div[@class="ValueFromThisClass"]');
And, finally, work with the results of that query :
if ($items->length > 0) {
var_dump( trim( $items->item(0)->nodeValue ) );
}
And here is your result :
string 'The Value I need' (length=16)