dongzhuandian3292 2015-07-08 10:48
浏览 48

如何获取HTML标记中的文本和所有父项的偏移量到PHP中的根标记?

I extract from an article for example the publicationYear, title and authors like this:

$aut = $xpath->query("//table[@cellpadding='6']//b[1]");
$authors = array();
foreach($aut as $node)
    $authors[] = $node->nodeValue;
$title = $doc->getElementsByTagName('h3')->item(1);
$publicationYear = $xpath->query("//p[1]//text()[(following::br)]")->item(0)->nodeValue;
$aux = $xpath->query("//p[2]//text()[(preceding::br)]");
$doi = substr($aux->item($aux->length - 1)->nodeValue, 4);

For all strings(the full name, year, title) i need to get even all the tags that come before like :

form1_table3_tbody1_tr1_td1_table5_tbody1_tr1_td2_p2

and the position in the tag like start: 163,end: 190. I know only that those informations are grouped in certain tags, but i need to get even the index of the tag if it has siblings that's why the example has table 3 for the third son of forum 1. If there's a way of doing it in php or at least javascript

UPDATE In te article I have:

...
<td valign="top"> 
<h3 class="blue-space">D-Lib Magazine</h3>
<p class="blue">November/December 2014<br>
Volume 20, Number 11/12<br><a href="http://www.dlib.org/dlib/november14/brook/../11contents.html" target="_blank">Table of Contents</a>
</p>
...

and the $publicationYear from the first code get this val 2014. The first code works fine. I need to create other 3 variables like $fathers =...td1_p1, $start=18, $end=22

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 如何用stata画出文献中常见的安慰剂检验图
    • ¥15 c语言链表结构体数据插入
    • ¥40 使用MATLAB解答线性代数问题
    • ¥15 COCOS的问题COCOS的问题
    • ¥15 FPGA-SRIO初始化失败
    • ¥15 MapReduce实现倒排索引失败
    • ¥15 ZABBIX6.0L连接数据库报错,如何解决?(操作系统-centos)
    • ¥15 找一位技术过硬的游戏pj程序员
    • ¥15 matlab生成电测深三层曲线模型代码
    • ¥50 随机森林与房贷信用风险模型