dongxin8709 2016-09-08 23:38
浏览 102


I am trying to grab RSS using below code.


$client  = new \GuzzleHttp\Client(['User-Agent' => 'idap']);
$content = $client->request('GET', '');


and it returns the following:




<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<meta http-equiv="Content-Script-Type" content="text/javascript">

<script type="text/javascript">

function getCookie(c_name) { // Local function for getting a cookie value

    if (document.cookie.length > 0) {

        c_start = document.cookie.indexOf(c_name + "=");

        if (c_start!=-1) {

        c_start=c_start + c_name.length + 1;

        c_end=document.cookie.indexOf(";", c_start);

        if (c_end==-1) 

            c_end = document.cookie.length;

        return unescape(document.cookie.substring(c_start,c_end));



    return "";


function setCookie(c_name, value, expiredays) { // Local function for setting a value of a cookie

    var exdate = new Date();


    document.cookie = c_name + "=" + escape(value) + ((expiredays==null) ? "" : ";expires=" + exdate.toGMTString()) + ";path=/";


function getHostUri() {

    var loc = document.location;

    return loc.toString();


setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '', 10);

try {  


} catch (err1) {  

    try {  


    } catch (err2) {  

    \tlocation.href = getHostUri();  






<noscript>This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser.</noscript>



How can I get RSS from link. Also a lot of sites do not give full description in RSS. How can I get complete description by code like did, and what if RSS file is big and takes a lot of time to load.


  • 写回答

1条回答 默认 最新

  • dongti7838 2016-09-09 08:21

    I have updated my answer to use the GuzzleHttp\Client. I have tested this code myself and works with GuzzleHttp version ^6.2. You have to use composer to install specific version just in case. I assume you know how to get the provided code (given below) up and running with composer.


    When we try to visit RSS feed it first tries to find the cookie for the IP from which the request is hitting to its server. If it do not find any cookie set for the IP then it sets the cookie with Cookie_Hash:IP. The part of code which sets cookie is:

    setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '', 10);

    Once, the cookie is set, javascript code then redirects the browser. After redirection, since the cookie has been set for the IP, the request completes successfully. Thus the complete RSS feed is sent to the browser.

    You can see read the full javascript source code where all these happen. The header request that needs to be sent with our guzzle request can be easily obtained from the Request header sent via browsers using debug tool of chrome/firefox.

    Let us know if you have any confusions.

    require_once 'vendor/autoload.php';
    $client = new \GuzzleHttp\Client([
        'base_uri' => '',
        'cookies' => true,
    $res = $client->request('GET', '/.mrss/ar.xml');
    $firstResponse = $res->getBody();
    // Search for following string
    // setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '', 10);
    $pattern = '/[^setCookie\(\')](.*?),/';
    preg_match_all($pattern, $firstResponse, $matches);
    // You may have to adjust this
    $cookie = $matches[1][4]; // YPF8827340282Jdskjhfiw_928937459182JAX666
    $ip     = $matches[1][5]; //
    $cookieName  = explode("'", $cookie)[1];
    $cookieValue = explode("'", $ip)[1];
    // Set cookie value, Cookie: $cookieName=$cookieValue
    $res = $client->request('GET', '/.mrss/ar.xml', [
        'headers' => [
            'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 ' .
                '(KHTML, like Gecko) Chrome/53.0.2785.89 Safari/537.36',
            'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,' .
            'Accept-Encoding' => 'gzip, deflate, sdch',
            'Cookie' => ["$cookieName=$cookieValue"],
            'Referer' => '',
            'Upgrade-Insecure-Requests' => 1,
            'Connection' => 'keep-alive',
        // 'debug' => false, // Set to true for debugging
    echo $res->getBody();

    Note: I have tested this code with "guzzlehttp/guzzle": "^6.2".

    本回答被题主选为最佳回答 , 对您是否有帮助呢?



  • ¥15 关于用pyqt6的项目开发该怎么把前段后端和业务层分离
  • ¥30 线性代数的问题,我真的忘了线代的知识了
  • ¥15 有谁能够把华为matebook e 高通骁龙850刷成安卓系统,或者安装安卓系统
  • ¥188 需要修改一个工具,懂得汇编的人来。
  • ¥15 livecharts wpf piechart 属性
  • ¥20 数学建模,尽量用matlab回答,论文格式
  • ¥15 昨天挂载了一下u盘,然后拔了
  • ¥30 win from 窗口最大最小化,控件放大缩小,闪烁问题
  • ¥20 易康econgnition精度验证
  • ¥15 msix packaging tool打包问题