dongxun3777
dongxun3777
2016-07-15 16:20
浏览 64
已采纳

如何使用PHP Gouttee发送自定义标头

I am trying to scrape a site that actually block Bots.

I have this code in PHP cURL to get away with blockage.

$headers = array(
    'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding: zip, deflate, sdch'
    , 'Accept-Language:en-US,en;q=0.8'
    , 'Cache-Control:max-age=0',
    'User-Agent:' . $user_agents[array_rand($user_agents)]
);
curl_setopt($curl_init, CURLOPT_URL, $url);
curl_setopt($curl_init, CURLOPT_HTTPHEADER, $headers);
$output = curl_exec($curl_init);

It works well.

But I am using PHP Goutte, I want to generate same request using this library

$headers2 = array(
    'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding' => 'zip, deflate, sdch'
    , 'Accept-Language' => 'en-US,en;q=0.8'
    , 'Cache-Control' => 'max-age=0',
    'User-Agent' => $user_agents[array_rand($user_agents)]
);
$client = new Client();

foreach ($headers2 as $key => $v) {
    $client->setHeader($key, $v);
}
$resp = $client->request('GET', $url);
echo $resp->html();

But using this code I get blocked from the site I am scraping.

I want to know how can I use Gouttee to properly use Headers?

图片转代码服务由CSDN问答提供 功能建议

我试图刮掉一个实际阻止Bots的网站。

I 在PHP cURL中使用此代码来消除阻塞。

  $ headers = array(
'接受:text / html,application / xhtml + xml,application / xml;  q = 0.9,image / webp,* / *; q = 0.8',
'接受编码:zip,deflate,sdch'
,'Accept-Language:en-US,en; q = 0.8'
  ,'Cache-Control:max-age = 0',
'User-Agent:'。$ user_agents [array_rand($ user_agents)] 
); 
curl_setopt($ curl_init,CURLOPT_URL,$ url); 
curl_setopt(  $ curl_init,CURLOPT_HTTPHEADER,$ headers); 
 $ output = curl_exec($ curl_init); 
   
 
 

效果很好。 < p>但我正在使用 PHP Goutte ,我想使用此库生成相同的请求 \ n

  $ headers2 = array(
'Accept'=&gt;'text / html,application / xhtml + xml,application / xml; q = 0.9,image / webp,* / *; q  = 0.8',
'接受编码'=&gt;'zip,deflate,sdch'
,'A  ccept-Language'=&gt;  'en-US,en; q = 0.8'
,'Cache-Control'=&gt;  'max-age = 0',
'User-Agent'=&gt;  $ user_agents [array_rand($ user_agents)] 
); 
 $ client = new Client(); 
 
foreach($ headers2 as $ key =&gt; $ v){
 $ client-&gt; setHeader($  key,$ v); 
} 
 $ resp = $ client-&gt; request('GET',$ url); 
echo $ resp-&gt; html(); 
    
 
 

但是使用这段代码我被阻止了我正在抓取的网站。

我想知道如何使用Gouttee来正确使用标题?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • dongya2029
    dongya2029 2016-07-15 16:59
    已采纳

    Can you try to check result of Goutte

    $status_code = $client->getResponse()->getStatus();
    echo $status_code;
    

    This is source code I had success with Guzzle In index.php

    <?php
        ini_set('display_errors', 1);
    ?>
    <html>
    <head><meta charset="utf-8" /></head>
    <?php
        $begin = microtime(true);
        require 'vendor/autoload.php';
        require 'helpers/helper.php';
        $client = new GuzzleHttp\Client([
            'base_uri' => 'http://www.yellowpages.com.au',
            'cookies' => true,
            'headers' =>  [
                'Accept'          => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
                'Accept-Encoding' => 'zip, deflate, sdch', 
                'Accept-Language' => 'en-US,en;q=0.8', 
                'Cache-Control'   => 'max-age=0',
                'User-Agent'      => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0'
            ]
        ]);
        $helper = new Helper($client);
        $mostViewed = $helper->getPageTest();
    ?>
    <html>
    

    In helper.php file

    <?php
    use GuzzleHttp\ClientInterface;
    use Symfony\Component\DomCrawler\Crawler;
    class Helper{
        protected $client;
        protected $totalPages;
        public function __construct(ClientInterface $client){
            $this->client       = $client;
            $this->totalPages   = 3;
        }
        public function query()
        {
            $queries = array(
                'clue'  => 'Builders',
                'locationClue'  => 'Sydney%2C+2000',
                'mappable' => 'true',
                'selectedViewMode' => 'list'
            );
            // print_r($queries);
            return $this->client->get('search/listings', array('query' => $queries));
        }
        public function getPageTest()
        {
            $responses = $this->query();
            $html = $responses->getBody()->getContents();
            echo $html;
            exit();
        }
    }
    ?>
    

    And result I got

    enter image description here

    Hope this helpful!!!

    点赞 评论

相关推荐