普通网友 2015-10-20 09:01
浏览 31
已采纳

无法从其他页面获取准确的值

I am trying to get score table from this page http://www.skysports.com/football/competitions/bundesliga/table. I do this with

$bundes = file('http://www.skysports.com/football/competitions/bundesliga/table');

And when i try to display array $bundes i do it with this:

echo '<pre>', print_r($bundes), '</pre>';

The code witch i try do display is displayed like this:

[1437] => 
[1022] => German Bundesliga 2015/16
#   Team    Pl  W   D   L   F   A   GD  Pts Last 6
1   [1059] => [1060] => Bayern Munich [1061] => [1062] =>   9   9   0   0   29  4   25  27  [1072] =>
[1073] =>
[1074] =>

This is the first row of table. And now i can display $bundes[1060] and i get output of Bayer Munich but how can i get values from $bundes[1062], values are 9, 9, 0, 0, 29, 4, 25 and 27? I need to display each of this values in <td></td> When i try to echo $bundes[1062] i get nothing.

  • 写回答

1条回答 默认 最新

  • dongwang6837 2015-10-20 10:00
    关注

    A more reliable way of extracting the data is using DOM manipulation classes to do something like:

    $doc = new \DOMDocument();
    @$doc->loadHTMLFile('http://www.skysports.com/football/competitions/bundesliga/table');
    
    $xpath = new \DOMXPath($doc);
    $rows = $xpath->query('//tbody/tr');
    
    $data = [];
    
    foreach ($rows as $i => $row) {
        $columns = $xpath->query('td', $row);
    
        foreach ($columns as $column) {
            $data[$i][] = trim($column->textContent);
        }
    }
    
    print_r($data);
    

    Which gives you:

    Array
    (
        [0] => Array
            (
                [0] => 1
                [1] => Bayern Munich
                [2] => 9
                [3] => 9
                [4] => 0
                [5] => 0
                [6] => 29
                [7] => 4
                [8] => 25
                [9] => 27
                [10] => 
            )
    ...
    

    Regarding Dagon's comment, no terms can disallow crawling and extracting the data (as long as you do so at a reasonable rate that does not impact the website's performance). Terms of use & copyright law, however, do dictate what you can and cannot do with the crawled content (ex. republish).

    Web scraping may be against the terms of use of some websites. The enforceability of these terms is unclear (see "FAQ about linking – Are website terms of use binding contracts?").

    - Wikipedia, Web scraping: Legal issues

    BTW, the pages robots meta tag does allow INDEX.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本