dongzhuandian3292 2015-07-08 10:48
浏览 48

如何获取HTML标记中的文本和所有父项的偏移量到PHP中的根标记?

I extract from an article for example the publicationYear, title and authors like this:

$aut = $xpath->query("//table[@cellpadding='6']//b[1]");
$authors = array();
foreach($aut as $node)
    $authors[] = $node->nodeValue;
$title = $doc->getElementsByTagName('h3')->item(1);
$publicationYear = $xpath->query("//p[1]//text()[(following::br)]")->item(0)->nodeValue;
$aux = $xpath->query("//p[2]//text()[(preceding::br)]");
$doi = substr($aux->item($aux->length - 1)->nodeValue, 4);

For all strings(the full name, year, title) i need to get even all the tags that come before like :

form1_table3_tbody1_tr1_td1_table5_tbody1_tr1_td2_p2

and the position in the tag like start: 163,end: 190. I know only that those informations are grouped in certain tags, but i need to get even the index of the tag if it has siblings that's why the example has table 3 for the third son of forum 1. If there's a way of doing it in php or at least javascript

UPDATE In te article I have:

...
<td valign="top"> 
<h3 class="blue-space">D-Lib Magazine</h3>
<p class="blue">November/December 2014<br>
Volume 20, Number 11/12<br><a href="http://www.dlib.org/dlib/november14/brook/../11contents.html" target="_blank">Table of Contents</a>
</p>
...

and the $publicationYear from the first code get this val 2014. The first code works fine. I need to create other 3 variables like $fathers =...td1_p1, $start=18, $end=22

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 python:excel数据写入多个对应word文档
    • ¥60 全一数分解素因子和素数循环节位数
    • ¥15 ffmpeg如何安装到虚拟环境
    • ¥188 寻找能做王者评分提取的
    • ¥15 matlab用simulink求解一个二阶微分方程,要求截图
    • ¥30 乘子法解约束最优化问题的matlab代码文件,最好有matlab代码文件
    • ¥15 写论文,需要数据支撑
    • ¥15 identifier of an instance of 类 was altered from xx to xx错误
    • ¥100 反编译微信小游戏求指导
    • ¥15 docker模式webrtc-streamer 无法播放公网rtsp