2016-09-08 23:38
浏览 94


I am trying to grab RSS using below code.


$client  = new \GuzzleHttp\Client(['User-Agent' => 'idap']);
$content = $client->request('GET', '');


and it returns the following:




<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<meta http-equiv="Content-Script-Type" content="text/javascript">

<script type="text/javascript">

function getCookie(c_name) { // Local function for getting a cookie value

    if (document.cookie.length > 0) {

        c_start = document.cookie.indexOf(c_name + "=");

        if (c_start!=-1) {

        c_start=c_start + c_name.length + 1;

        c_end=document.cookie.indexOf(";", c_start);

        if (c_end==-1) 

            c_end = document.cookie.length;

        return unescape(document.cookie.substring(c_start,c_end));



    return "";


function setCookie(c_name, value, expiredays) { // Local function for setting a value of a cookie

    var exdate = new Date();


    document.cookie = c_name + "=" + escape(value) + ((expiredays==null) ? "" : ";expires=" + exdate.toGMTString()) + ";path=/";


function getHostUri() {

    var loc = document.location;

    return loc.toString();


setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '', 10);

try {  


} catch (err1) {  

    try {  


    } catch (err2) {  

    \tlocation.href = getHostUri();  






<noscript>This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser.</noscript>



How can I get RSS from link. Also a lot of sites do not give full description in RSS. How can I get complete description by code like did, and what if RSS file is big and takes a lot of time to load.


  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • dongti7838 2016-09-09 08:21

    I have updated my answer to use the GuzzleHttp\Client. I have tested this code myself and works with GuzzleHttp version ^6.2. You have to use composer to install specific version just in case. I assume you know how to get the provided code (given below) up and running with composer.


    When we try to visit RSS feed it first tries to find the cookie for the IP from which the request is hitting to its server. If it do not find any cookie set for the IP then it sets the cookie with Cookie_Hash:IP. The part of code which sets cookie is:

    setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '', 10);

    Once, the cookie is set, javascript code then redirects the browser. After redirection, since the cookie has been set for the IP, the request completes successfully. Thus the complete RSS feed is sent to the browser.

    You can see read the full javascript source code where all these happen. The header request that needs to be sent with our guzzle request can be easily obtained from the Request header sent via browsers using debug tool of chrome/firefox.

    Let us know if you have any confusions.

    require_once 'vendor/autoload.php';
    $client = new \GuzzleHttp\Client([
        'base_uri' => '',
        'cookies' => true,
    $res = $client->request('GET', '/.mrss/ar.xml');
    $firstResponse = $res->getBody();
    // Search for following string
    // setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '', 10);
    $pattern = '/[^setCookie\(\')](.*?),/';
    preg_match_all($pattern, $firstResponse, $matches);
    // You may have to adjust this
    $cookie = $matches[1][4]; // YPF8827340282Jdskjhfiw_928937459182JAX666
    $ip     = $matches[1][5]; //
    $cookieName  = explode("'", $cookie)[1];
    $cookieValue = explode("'", $ip)[1];
    // Set cookie value, Cookie: $cookieName=$cookieValue
    $res = $client->request('GET', '/.mrss/ar.xml', [
        'headers' => [
            'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 ' .
                '(KHTML, like Gecko) Chrome/53.0.2785.89 Safari/537.36',
            'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,' .
            'Accept-Encoding' => 'gzip, deflate, sdch',
            'Cookie' => ["$cookieName=$cookieValue"],
            'Referer' => '',
            'Upgrade-Insecure-Requests' => 1,
            'Connection' => 'keep-alive',
        // 'debug' => false, // Set to true for debugging
    echo $res->getBody();

    Note: I have tested this code with "guzzlehttp/guzzle": "^6.2".

    打赏 评论

相关推荐 更多相似问题