doubeishuai6598 2015-10-20 16:50
浏览 39

通过PHP或JAVASCRIPT获取https网站的HTML代码

I have tried file_get_content and curl but both don't seem to work on the website. I have used both on previous projects.

Website: https://colruyt.collectandgo.be/cogo/nl/zoeken?z=5030

Anyone has a working solution. Been looking and testing for hours now :).


Curl also does not seem to work.

HTTP/1.1 200 OK Content-Length: 5395 Pragma: no-cache Cache-Control: no-cache Content-Type: text/html

Redirects to my own main domain name I used this code:

    <?php
    function geturl($url){

    (function_exists('curl_init')) ? '' : die('cURL Must be installed for geturl function to work. Ask your host to enable it or uncomment extension=php_curl.dll in php.ini');

$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; CrawlBot/1.0.0)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT , 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);    # required for https urls
curl_setopt($ch, CURLOPT_MAXREDIRS, 15);     

    $html = curl_exec($ch);
    $status = curl_getinfo($ch);
    curl_close($ch);

    if($status['http_code']!=200){
if($status['http_code'] == 301 || $status['http_code'] == 302) {
    list($header) = explode("

", $html, 2);
    $matches = array();
    preg_match("/(Location:|URI:)[^(
)]*/", $header, $matches);
    $url = trim(str_replace($matches[1],"",$matches[0]));
    $url_parsed = parse_url($url);
    return (isset($url_parsed))? geturl($url):'';
        }
    }
    return $html;
    }
    echo geturl("https://colruyt.collectandgo.be/cogo/nl/zoeken?z=5030");
    ?>
  • 写回答

1条回答 默认 最新

  • dsj1961061 2015-10-20 16:52
    关注

    Take a look at request

        var request = require('request');
    
        request('https://colruyt.collectandgo.be/cogo/nl/zoeken?z=5030', function (error, response, body) {
          console.log(body)
        })
    

    This prints the body of the response.

    评论

报告相同问题?