I'm poor in regex, here is my scenario,
I'm trying to extract some info from a webpage which contains several tables, only some of the tables contains a unique url (let's say "very/unique.key"), so it will looks like this:
<table ....>
(bunch of content)
</table>
<table ....>
(bunch of content)
</table>
<table ....>
(bunch of content + "very/unique.key" keyword)
</table>
<table ....>
(bunch of content)
</table>
<table ....>
(bunch of content + "very/unique.key" keyword)
</table>
So what I want is to extract all tables' content that contains the "very/unique.key" keyword. And here are the pattern that I have tried:
$pattern = "#<table[^>]+>((?!\<table)(?=very\/unique\.key).*)<\/table>#i";
This returns nothing to me....
$pattern = "#<table[^>]+>((?!<table).*)<\/table>#i";
This will return me everything from table 1's open tag <table...>
till the last table's close tag </table>
even with the (?!<table)
condition...
Appreciate anyone who are willing to help me on this, thanks.
--EDIT--
Here is the solution that I found using DOM to loop through every table
--My Solution--
$index;//indexes of all the table(s) that contains the keyword
$cd = 0;//counter
$DOM = new DOMDocument();
$DOM->loadHTMLFile("http://uni.corp/sub/sub/target.php?key=123");
$xpath = new DomXPath($DOM);
$tables = $DOM->getElementsByTagName("table");
for ($n = 0; $n < $tables->length; $n++) {
$rows = $tables->item($n)->getElementsByTagName("tr");
for ($i = 0; $i < $rows->length; $i++) {
$cols = $rows->item($i)->getElementsbyTagName("td");
for ($j = 0; $j < $cols->length; $j++) {
$td = $cols->item($j); // grab the td element
$img = $xpath->query('./img',$td)->item(0); // grab the first direct img child element
if(isset($img) ){
$image = $img->getAttribute('src'); // grab the source of the image
echo $image;
if($image == "very/unique.key"){
echo $cols->item($j)->nodeValue, "\t";
$index[$cd] = $n;
if($n > $cd){
$cd++;
}
echo $cd . " " . $n;//for troubleshooting
}
}
}
echo "<br/>";
}
}
//loop that echo out only the table(s) that I want which contains the keyword
$loop = sizeof($index);
for ($n = 0; $n < $loop; $n++) {
$temp = $index[$n];
$rows = $tables->item($temp)->getElementsbyTagName("tr");
for ($i = 0; $i < $rows->length; $i++) {
$cols = $rows->item($i)->getElementsbyTagName("td");
for ($j = 0; $j < $cols->length; $j++) {
echo $cols->item($j)->nodeValue, "\t";
//proccess the extracted table content here
}
//echo "<br/>";
}
}
But personally, I'm still curious about the Regex part, wish anyone could found the solution of the regex pattern for this question. Anyway, thanks to everyone who are helping/advising me on this (especially to AbsoluteƵERØ).