dongxun3777 2013-11-12 03:51
浏览 47
已采纳

试图使用PHP解析网页

I am trying to parse a webpage and print out a table which is on the webpage. I am using php_simple_html dom parser. However, when I try to parse the table off the webpage, all the javascript commands to output the table get turned into comments within the php:

<html>
<script type="text/javascript" src="jquery.js"></script>
<?php
    include 'crawling/simple_html_dom.php';
    $html = file_get_html('http://uiucfreefood.com/');


    $ret = $html->find('body', 0)->find('div', 10)->find('table',0); //gets to the table tag
    echo $ret; // nothing is echoed out because the original webpage uses jscript commands to write the table to the page but these commands get turned to comments for some reason.
?>
</html>

When I inspect the element of the page where I am echoing the parsed information I am able to see that the table tag with all the info is in there but the jscript commands have been turned into comments. Is there a way for me to just grab the info and echo it out myself? I tried adding another ->find('tbody'); at the end of the parse command but it doesn't do anything. Any advice is appreciated. Thanks.

EDIT: You can try this code out yourself if you download the simple_html_dom.php and include it in your php file. Source: http://sourceforge.net/projects/simplehtmldom/files/

EDIT: Just noticed something really important. The javascript commands are commented out in the original webpage also. Instead, the original webpage is using a javascript function to print out the table which I do not have defined. Writing that function myself should fix the issue.

EDIT: yup, that worked.

  • 写回答

1条回答 默认 最新

  • dragon8002 2013-11-12 04:05
    关注

    Try using file_get_content instead of get HTML and see if that works. Honestly, depending on your needs, you should code your own parser. It is not that hard to write a parser for the table scan and display.

    You will just need the following;

    $array = split("<table>", $content);
    $boolPlaceHolder = false;
    

    and you can then set the placeholder to true when you encounter this way you can scan through the chars of the content and grab the table.

    Hope this helps.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求螺旋焊缝的图像处理
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥15 网络通信安全解决方案
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面
  • ¥15 itunes恢复数据最后一步发生错误
  • ¥15 关于#windows#的问题:2024年5月15日的win11更新后资源管理器没有地址栏了顶部的地址栏和文件搜索都消失了
  • ¥100 H5网页如何调用微信扫一扫功能?
  • ¥15 讲解电路图,付费求解