I extract from an article for example the publicationYear, title and authors like this:
$aut = $xpath->query("//table[@cellpadding='6']//b[1]");
$authors = array();
foreach($aut as $node)
$authors[] = $node->nodeValue;
$title = $doc->getElementsByTagName('h3')->item(1);
$publicationYear = $xpath->query("//p[1]//text()[(following::br)]")->item(0)->nodeValue;
$aux = $xpath->query("//p[2]//text()[(preceding::br)]");
$doi = substr($aux->item($aux->length - 1)->nodeValue, 4);
For all strings(the full name, year, title) i need to get even all the tags that come before like :
form1_table3_tbody1_tr1_td1_table5_tbody1_tr1_td2_p2
and the position in the tag like start: 163,end: 190. I know only that those informations are grouped in certain tags, but i need to get even the index of the tag if it has siblings that's why the example has table 3 for the third son of forum 1. If there's a way of doing it in php or at least javascript
UPDATE In te article I have:
...
<td valign="top">
<h3 class="blue-space">D-Lib Magazine</h3>
<p class="blue">November/December 2014<br>
Volume 20, Number 11/12<br><a href="http://www.dlib.org/dlib/november14/brook/../11contents.html" target="_blank">Table of Contents</a>
</p>
...
and the $publicationYear from the first code get this val 2014. The first code works fine. I need to create other 3 variables like $fathers =...td1_p1, $start=18, $end=22