douhuan6305 2014-05-11 20:45
浏览 87
已采纳

使用PHP和cURL复制HTTP请求

I am trying to request a password protected page from something called "CM/ECF" (Pacer) to view court dockets and such with PHP/cURL.

I am using a FireFox extension called Tamper Data which allows me to see headers and POST data, then trying to replicate that request PHP using cURL.

It's not working for some reason, I keep getting a request to log in. I can log in just fine, save the cookie to the cookie jar and get the the "Main" page, but when I do a second curl call (sending the same cookie) to the search page the host redirects me to a login page.

Two part question: Part 1 - When I use TaperData to view the cookies that are sent when I request the page, TamperData shows me this:

PacerUser="xxxxxxxxxxx                               xxxxxxx"; 
PacerSession="xxxxxSW8+F/BCzRxxxxxxhYtWpfO4ZR8WTEYbnaeeoVixAp5YnKMWxxxxxx0U8MoEPt2FOxxxxxxx/5B9ujb"; 
PacerPref="receipt=Y"; 
PacerClientCode=""; 
__utma=20643455934534311.139983455.139934505.13998383455.1; 
__utmb=206345345.10.13453405; 
__utmc=2053453433351; 
__utmz=20653453351.1399345345.1.utmcsr=pacer.gov|utmccn=(referral)|utmcmd=referral|utmcct=/cmecf/developer/

But the cookie file generated by libcurl doesn't include any of the lines that begin with an underscore. What are those?

Here's the request my browser makes, copied from TamperData:

Host=ecf.almb.uscourts.gov
User-Agent=Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0
Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language=en-US,en;q=0.5
Accept-Encoding=gzip, deflate
DNT=1
Cookie=PacerUser="wmasdfasdf                                ZFBgasdfasdfsdff PacerSession="7rkPasdfasdfasdfasdfasdfsdadfnaeeoVixAp5YnKMW9lokKeq4ss4m0U8MoEPt2FOj2P/51RLh/5B9ujb"; PacerPref="receipt=Y"; PacerClientCode=""; __utma=203145253483351.15234521.13998234523405.139234505.139982345305.1; __utmc=2034533351; __utmz=206453453351.14538105.1.1.utmcsr=pacer.gov|utmccn=(referral)|utmcmd=referral|utmcct=/cmecf/developer/
Connection=keep-alive
Cache-Control=max-age=0

Here's my PHP

$Headers = array(
    "Host: ".$this->CaseFiled_endpoints[$district],
    "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language: en-US,en;q=0.5",
    "Accept-Encoding: gzip, deflate",
    "Connection: keep-alive"
);        


$url = "https://".$this->CaseFiled_endpoints[$district]."/cgi-bin/CaseFiled-Rpt.pl";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $Headers);
curl_setopt($ch, CURLOPT_REFERER, $url); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, realpath($this->cookiefile));
curl_setopt($ch, CURLOPT_COOKIEFILE, realpath($this->cookiefile));
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$answer2 = curl_exec($ch);

return curl_getinfo($ch);

Is there anything blatantly wrong with my code? Are there any other tools that might make this easier? A browser plugin that spits out curl code?

  • 写回答

2条回答 默认 最新

  • dsgft1486 2014-07-05 22:16
    关注

    here is the magic soup you are missing, a $cookie file in curl_setopt.

    curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
    curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);
    

    then you would fist curl post to the login form, save the cookie file, and then check for the filetime on the cookie ( to see if its out of date ) and create new cookie or send the $cookie file in your subsequent requests.

    note i dont have this line

    curl_setopt($ch, CURLOPT_COOKIESESSION, true);
    

    also note http://curl.haxx.se/libcurl/c/CURLOPT_COOKIESESSION.html

    Pass a long set to 1 to mark this as a new cookie "session". It will force libcurl to ignore all cookies it is about to load that are "session cookies" from the previous session. By default, libcurl always stores and loads all cookies, independent if they are session cookies or not. Session cookies are cookies without expiry date and they are meant to be alive and existing for this "session" only.

    I think you are telling it to start a new session every time.

    p.s. - I use pacer as well.

    public function Login(){
            $cookie_file = __DIR__."/cookie.txt";
            $cookie_file = str_replace("\\", "/", $cookie_file);
            $this->_cookie_file = $cookie_file;
            $new_file = false;
            if(!is_file($cookie_file)){
                $h = fopen($cookie_file, "w");
                fclose($h);
                $file_time = time();
                $new_file = true;
            }else{
                $file_time = filemtime($cookie_file);
            }
    
            //login
            if($file_time < (time() - 1800) || $new_file){
                $url = "https://pacer.login.uscourts.gov/cgi-bin/check-pacer-passwd.pl";
                $post = array(
                        "loginid"=>"loginID",
                        "passwd"=>"password",
                        "client"=> "client",
                        "faction"=>"Login",
                        "appurl"=>"https://pcl.uscourts.gov/search"
                );
    
    
                $res = $this->_cUrl->cPost($url, $post, $cookie_file);
                $this->Log("LOGGING IN AT ".date("Y-m-d H:i:s"));
                sleep(2);
                $this->Log("SLEEPING 2 ..",E_USER_DEPRECATED);
            }
    
        }
    

    from my curl library class.

    public function cPost($url, $post, $cookie_file="cookie.txt"){
            if(is_array($post)){
                $post_string = $this->encodePost($post);
            }else{
                $post_string = $post;
            }
    
            $cookie = str_replace("\\", "/", $cookie_file);
            $fc = fopen($cookie, "r");
            fclose($fc);
            $ch = curl_init();
    
            curl_setopt($ch, CURLOPT_VERBOSE, 1);
            curl_setopt($ch, CURLOPT_STDERR, $this->_error_handle); 
            fwrite($this->_error_handle,"Starting debug file ".date('Y-m-d H:i:s')."
    ");
    
            curl_setopt ($ch, CURLOPT_URL, $url); 
            curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
            curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"); 
            curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
            curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); 
            curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
            curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
            curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);
            curl_setopt ($ch, CURLOPT_REFERER, $url); 
            curl_setopt($ch, CURLINFO_HEADER_OUT, true); // enable tracking
            curl_setopt ($ch, CURLOPT_POSTFIELDS, $post_string); 
            curl_setopt ($ch, CURLOPT_POST, 1); 
            $result = curl_exec ($ch); 
            if ( curl_errno($ch) ) {
                $response = 'ERROR -> ' . curl_errno($ch) . ': ' . curl_error($ch);
                throw new CurlException($response);
            } else {
                $returnCode = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
                switch($returnCode){
                    case 404:
                        $response = 'ERROR -> 404 Not Found';
                        throw new CurlException($response, CurlException::ER_RETURN_CODE);
                    break;
                    default:
    
                    break;
                }
            }
            curl_close($ch);
            return $result;
        }
    

    to access there search form.

    $url = "https://pcl.uscourts.gov/dquery";
            $post = array(
                "case_no"=>$case_no,
                "mdl_id"=>"",
                "stitle"=>"",
                "nos"=> array(
                        "370",
                        "371",
                        "440",
                        "470",
                        "480",
                        "890"
                    ),
                "date_filed_start"=>$date_filed_start,
                "date_filed_end"=>$date_filed_end,
                "date_term_start"=>"",
                "date_term_end"=>"",
                "date_dismiss_start"=>"",
                "date_dismiss_end"=>"",
                "date_discharge_start"=>"",
                "date_discharge_end"=>"",
                "party"=>$party,
                "ssn4"=>"",
                "ssn"=>"",
                "court_type"=>"cv",
                "default_form"=>"cvb"
            );
    
            print_r($post);
    
            $html = $this->_cUrl->cPost($url, $post, $this->_cookie_file);
    

    I have this code in production environment for over a year now - here are the keys to the kingdom lol.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 软件定义网络mininet和onos控制器问题
  • ¥15 微信小程序 用oss下载 aliyun-oss-sdk-6.18.0.min client报错
  • ¥15 ArcGIS批量裁剪
  • ¥15 labview程序设计
  • ¥15 为什么在配置Linux系统的时候执行脚本总是出现E: Failed to fetch http:L/cn.archive.ubuntu.com
  • ¥15 Cloudreve保存用户组存储空间大小时报错
  • ¥15 伪标签为什么不能作为弱监督语义分割的结果?
  • ¥15 编一个判断一个区间范围内的数字的个位数的立方和是否等于其本身的程序在输入第1组数据后卡住了(语言-c语言)
  • ¥15 Mac版Fiddler Everywhere4.0.1提示强制更新
  • ¥15 android 集成sentry上报时报错。