duan0818 2017-11-10 09:52
浏览 46
已采纳

PHP DOMXPath在td中提取锚点的href

Using PHP DOMXPath I need to get the "href" of an anchor that is contained inside a td. I already able to get all the correct xPath to reach the td and i can get the text inside but i cant understand how can i extract the anchor. For my other needed i must extract all tr as first step so my current code is below:

$xpath = new DOMXPath($dom);
$trList = $xpath->query('//div[@id="main_content"]/table/tr/td/table[3]/tr[2]/td/table/tr');
$rowToSkip = 1;
foreach($trList as $rowNum => $row){        
        if($rowNum <= $rowToSkip){
            continue;
        }
        $cols = $row->childNodes;
        $dataList[($rowNum-$rowToSkip)]['number'] = preg_replace("/[^0-9]/", "", strip_tags($cols->item(2)->nodeValue));
}

how can i retrieve the href?

i also try with

$cols->item(2)->attributes->getNamedItem("href")->nodeValue

but with no luck

Below The HTML sample that is exacly the same as the original one:

<div id="main_content">
<table class="wrapper" border="0" cellspacing="0" cellpadding="0">
        <tr>
            <td>
                <table border="0" cellspacing="0" cellpadding="0" id="breadcrumb">
                        <tr>
                            <td class="breadcrumb">
                                <a href="" class="breadcrumb">head link</a>
                                <a href="" class="breadcrumb">head link</a>
                            </td>
                        </tr>
                </table>
                <div><img src="space.gif" width="1" height="7" alt="" border="0"></div>                    
                <table border="0" cellspacing="0" cellpadding="0" width="100%">
                        <tr>
                            <td colspan="5" >test</td>
                        </tr>
                        <tr>
                            <td colspan="5"></td>
                        </tr>
                </table>
                <div><img width="1" height="32" border="0" alt="" src="space.gif"></div>
                <table border="0" cellpadding="0" cellspacing="0" width="100%">
                        <tr>
                            <td width="100%" >test 02</td>
                        </tr>
                        <tr>
                            <td>
                                <table width="100%" border="0" cellspacing="0" cellpadding="0">
                                        <tr>
                                            <td nowrap="nowrap" colspan="8">header col 1</td>
                                            <td nowrap="nowrap" colspan="5">header col 2</td>
                                        </tr>
                                        <tr>
                                            <td nowrap="nowrap">
                                                <a href="" >test col 0</a>
                                            </td>
                                            <td  nowrap="nowrap">
                                                <a href="" >test col 1</a>
                                            </td>
                                            <td  nowrap="nowrap">test col 2</td>
                                            <td  nowrap="nowrap">
                                                <a href="" >test col 3</a>
                                            </td>
                                            <td  nowrap="nowrap">
                                                <a href="" >test col 4</a>
                                            </td>
                                            <td  nowrap="nowrap">
                                                <a href="" >test col 5</a>
                                            </td>
                                            <td  nowrap="nowrap">test col 6</td>
                                            <td  nowrap="nowrap">test col 7</td>
                                            <td  nowrap="nowrap">test col 8</td>
                                            <td  nowrap="nowrap">test col 9</td>
                                            <td  nowrap="nowrap">test col 10</td>
                                            <td  nowrap="nowrap">test col 11</td>
                                            <td  nowrap="nowrap">test col 12</td>
                                        </tr>
                                        <tr>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 0</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1" style="background-color:red">
                                                <a href="" >detail info col 1 this is needed column</a>                                                    
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 2</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 3</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 4</a>
                                            </td>
                                           <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 5</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 6</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 7</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 8</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 9</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 10</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 11</a>
                                            </td>
                                            <td  nowrap="nowrap" rowspan="1">
                                                <a href="" >detail info col 12</a>
                                            </td>
                                        </tr>
                                </table>
                            </td>
                        </tr>
                </table>
            </td>
        </tr>
</table>

  • 写回答

1条回答 默认 最新

  • dongtiaobeng7901 2017-11-10 10:16
    关注

    With the structure you posted, the following outputs the href-value:

    <?php
    $dom = new DOMDocument('1.0');
    $dom->loadHTMLFile('input.html');
    
    $xpath = new DOMXPath($dom);
    
    $query = '//*[@id="main_content"]/table/tr/td/table[3]/tr[2]/td/table/tr[position() >= 3]/td[2]/a';
    
    $nodes = $xpath->query($query);
    
    foreach ($nodes as $node) {
        /** @var $node DOMElement */
        var_dump(
            $node->getAttribute('href'), // the href-attribute value
            $node->nodeValue // the inner text
        );
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同
  • ¥50 如何openEuler 22.03上安装配置drbd
  • ¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
  • ¥15 无线连接树莓派,无法执行update,如何解决?(相关搜索:软件下载)
  • ¥15 Windows11, backspace, enter, space键失灵