dongxun3777 2013-11-12 03:51
浏览 47
已采纳

试图使用PHP解析网页

I am trying to parse a webpage and print out a table which is on the webpage. I am using php_simple_html dom parser. However, when I try to parse the table off the webpage, all the javascript commands to output the table get turned into comments within the php:

<html>
<script type="text/javascript" src="jquery.js"></script>
<?php
    include 'crawling/simple_html_dom.php';
    $html = file_get_html('http://uiucfreefood.com/');


    $ret = $html->find('body', 0)->find('div', 10)->find('table',0); //gets to the table tag
    echo $ret; // nothing is echoed out because the original webpage uses jscript commands to write the table to the page but these commands get turned to comments for some reason.
?>
</html>

When I inspect the element of the page where I am echoing the parsed information I am able to see that the table tag with all the info is in there but the jscript commands have been turned into comments. Is there a way for me to just grab the info and echo it out myself? I tried adding another ->find('tbody'); at the end of the parse command but it doesn't do anything. Any advice is appreciated. Thanks.

EDIT: You can try this code out yourself if you download the simple_html_dom.php and include it in your php file. Source: http://sourceforge.net/projects/simplehtmldom/files/

EDIT: Just noticed something really important. The javascript commands are commented out in the original webpage also. Instead, the original webpage is using a javascript function to print out the table which I do not have defined. Writing that function myself should fix the issue.

EDIT: yup, that worked.

  • 写回答

1条回答 默认 最新

  • dragon8002 2013-11-12 04:05
    关注

    Try using file_get_content instead of get HTML and see if that works. Honestly, depending on your needs, you should code your own parser. It is not that hard to write a parser for the table scan and display.

    You will just need the following;

    $array = split("<table>", $content);
    $boolPlaceHolder = false;
    

    and you can then set the placeholder to true when you encounter this way you can scan through the chars of the content and grab the table.

    Hope this helps.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 AT89C51控制8位八段数码管显示时钟。
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口