douyan2821 2013-06-25 02:14
浏览 24
已采纳

解析HTML几个表DOM [关闭]

When preparing to do the following I found a lot of info that was not clear so I thought id ask this to see if someone could clear somethings up for me.

what exactly is the @ symbol doing to the following

 $domOb = new DOMDocument();
 $html  = @$domOb->loadHTMLFile('http:...'); 

This did remove an error and actually parse the data but is this a good practice solution. I have used this without the @ symbol and got expected results.

Given that I have several tables what is the best/simplist way to get all the <td> from lets say table 3. I was going to list all the <td> and then simply start and end with the value that correlates with the needed data

If looking to parse HTML via PHP I like the Idea of using DOM so when getting the file what should I use. loadHTMLFile() loadHTML()... can I still use Xpath?...If its very busy/badly marked up HTML does this matter?

Whats good practice for looking through the data

    $items = $domOb->getElementsByTagName('td');

    $k    = 0;
    $num  = $items->length;
    while ($k < $num)
    {
        echo $item_web = $items->item($k)->, '<br>';
        $k++;
    }

I found this which is good How do you parse and process HTML/XML in PHP? but its 2 years old so I thought id pose a few questions.

Just a small clip of the 3rd table... At first glance I noticed a space at the 3rd tag does this affect the results?

 <td>Parcel ID: <a href=... style=text-decoration:underline;><b>666666</b></a></td>
 <td>Name: Mr. help</td></tr><tr>
 <td >Parcel Address: 666 help RD&nbsp;</td>
 <td>Name2: Ms. help F</td></tr><tr><td>City: Helpover 66666</td>
 <td>Address: 6666 6TH AVE NE UNIT 333</td>
  • 写回答

2条回答 默认 最新

  • doutangkao2789 2013-06-25 03:00
    关注

    what exactly is the @ symbol doing to the following

    It's supposed to suppress errors, but this is not the right way to do it on DomDocument and related extensions. The correct way is calling libxml_use_internal_errors(true); before loading the malformed HTML.

    can I still use Xpath?.

    Yes:

    $xpath = new DomXPath($domOb);
    $tds = $xpath->query('//td');
    

    I noticed a space at the 3rd tag does this affect the results?

    Entities are converted when you access the textContent property from your TD nodes.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥30 YOLO检测微调结果p为1
  • ¥20 求快手直播间榜单匿名采集ID用户名简单能学会的
  • ¥15 DS18B20内部ADC模数转换器
  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题