dongpiao1983 2012-11-05 22:20
浏览 54

登录后使用cURL从网站上抓取数据?

What I am trying to do is login to a website and then go and grab data from a table since they do not have an export feature. So far I've managed to login and it shows me the user homepage. However I need to navigate to a different page or somehow grab that page while still being logged in with curl.

My code so far:

$username="email"; 
$password="password"; 
$url="https://jiltapp.com/sessions"; 
$cookie="cookie.txt";
$url2 = "https://jiltapp.com/shops/shopname/orders";

$postdata = "email=".$username."&password=".$password; 

$ch = curl_init(); 
curl_setopt ($ch, CURLOPT_URL, $url); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
curl_setopt ($ch, CURLOPT_REFERER, $url); 

curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
curl_setopt ($ch, CURLOPT_POST, 1); 
$result = curl_exec ($ch); 

echo $result;  
curl_close($ch);

As I mentioned i get access to the main user page, but I need to grab the contents of the $url2 variable, not $url. How can I accomplish something like that?

Thank you!

  • 写回答

1条回答 默认 最新

  • doushuangdui5419 2012-11-05 22:24
    关注

    Once logged in, make a second request for the page that contains the data you are after.

    For subsequent requets, you must set the option CURLOPT_COOKIEFILE which points to the same file as CURLOPT_COOKIEJAR. cURL will read cookies from this file and send them with the request.

    $username="email"; 
    $password="password"; 
    $url="https://jiltapp.com/sessions"; 
    $cookie="cookie.txt";
    $url2 = "https://jiltapp.com/shops/shopname/orders";
    
    $postdata = "email=".$username."&password=".$password; 
    
    $ch = curl_init(); 
    curl_setopt ($ch, CURLOPT_URL, $url); 
    curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
    curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
    curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); 
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
    curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);  // <-- add this line
    curl_setopt ($ch, CURLOPT_REFERER, $url); 
    
    curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
    curl_setopt ($ch, CURLOPT_POST, 1); 
    $result = curl_exec ($ch); 
    
    echo $result;  
    
    // make second request
    
    $url = 'page you want to get data from';
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, 0);
    
    $data = curl_exec($ch);
    
    评论

报告相同问题?

悬赏问题

  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 正弦信号发生器串并联电路电阻无法保持同步怎么办
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 个人网站被恶意大量访问,怎么办
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM
  • ¥15 划分vlan后不通了
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)