dtrotfd1012 2019-01-08 02:14
浏览 69
已采纳

用PHP刮取页面

I want to scrape some data with Php Simple Dom parser from soccerstats.com, but I cannot because always appear the cookie page before loading the normal page. How to bypass the cookie page? My code is this:

<?php
    include_once('../scrapper/scrapper.php');
    $url = 'https://www.soccerstats.com/matches.asp';
    $html = file_get_html($url);

    $stats = array();
    foreach($html->find('table') as $table) {
        $stats[] = $table->outertext;
    }
    $results = implode(",", $stats);    

    echo $results; 
?>
  • 写回答

1条回答 默认 最新

  • dougu3290 2019-01-08 02:43
    关注

    A very quick look at the page https://www.soccerstats.com/matches.asp showed that what the "cookie page" really does is that it requires the user to click on a button, which - when clicked - just sets a cookie cookiesok to a value of yes, as seen in source of that page:

    <button class="button button3" onclick=" setCookielocal('cookiesok', 'yes', 365)"><font size='4'>I agree. Continue to website.</font></button>
    

    So, what we need to do is to somehow make PHP to fetch the page with this cookie set.

    Since you're using the https://sourceforge.net/projects/simplehtmldom/ library and its function file_get_html(), I looked into the source code of that function and found out that it really uses the file_get_contents() function behind the scenes - and at the same time it allows us to pass our own "context", which we can create via the stream_context_create() function.

    In short, stream_context_create() allows us to create a context with required cookies to be used in the file_get_html() function.

    Final code:

    <?php
    
        include_once '../scrapper/scrapper.php';
    
        // Options for the context we're about to create.
        $options = [
            "http" => [
                "header" => "Cookie: cookiesok=yes
    ",
            ],
        ];
    
        // Context we're going to pass to the file_get_html() function.
        $context = stream_context_create($options);
    
        $url = 'https://www.soccerstats.com/matches.asp';
        $html = file_get_html($url, false, $context);
    
        $stats = array();
        foreach($html->find('table') as $table) {
            $stats[] = $table->outertext;
        }
        $results = implode(",", $stats);
    
        echo $results;
    

    展开全部

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部