dtrotfd1012 2019-01-08 10:14
浏览 69
已采纳

用PHP刮取页面

I want to scrape some data with Php Simple Dom parser from soccerstats.com, but I cannot because always appear the cookie page before loading the normal page. How to bypass the cookie page? My code is this:

<?php
    include_once('../scrapper/scrapper.php');
    $url = 'https://www.soccerstats.com/matches.asp';
    $html = file_get_html($url);

    $stats = array();
    foreach($html->find('table') as $table) {
        $stats[] = $table->outertext;
    }
    $results = implode(",", $stats);    

    echo $results; 
?>
  • 写回答

1条回答 默认 最新

  • dougu3290 2019-01-08 10:43
    关注

    A very quick look at the page https://www.soccerstats.com/matches.asp showed that what the "cookie page" really does is that it requires the user to click on a button, which - when clicked - just sets a cookie cookiesok to a value of yes, as seen in source of that page:

    <button class="button button3" onclick=" setCookielocal('cookiesok', 'yes', 365)"><font size='4'>I agree. Continue to website.</font></button>
    

    So, what we need to do is to somehow make PHP to fetch the page with this cookie set.

    Since you're using the https://sourceforge.net/projects/simplehtmldom/ library and its function file_get_html(), I looked into the source code of that function and found out that it really uses the file_get_contents() function behind the scenes - and at the same time it allows us to pass our own "context", which we can create via the stream_context_create() function.

    In short, stream_context_create() allows us to create a context with required cookies to be used in the file_get_html() function.

    Final code:

    <?php
    
        include_once '../scrapper/scrapper.php';
    
        // Options for the context we're about to create.
        $options = [
            "http" => [
                "header" => "Cookie: cookiesok=yes
    ",
            ],
        ];
    
        // Context we're going to pass to the file_get_html() function.
        $context = stream_context_create($options);
    
        $url = 'https://www.soccerstats.com/matches.asp';
        $html = file_get_html($url, false, $context);
    
        $stats = array();
        foreach($html->find('table') as $table) {
            $stats[] = $table->outertext;
        }
        $results = implode(",", $stats);
    
        echo $results;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器
  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用
  • ¥15 C++ yoloV5改写遇到的问题
  • ¥20 win11修改中文用户名路径
  • ¥15 win2012磁盘空间不足,c盘正常,d盘无法写入
  • ¥15 用土力学知识进行土坡稳定性分析与挡土墙设计
  • ¥70 PlayWright在Java上连接CDP关联本地Chrome启动失败,貌似是Windows端口转发问题
  • ¥15 帮我写一个c++工程