dsjojts9734 2011-10-28 16:56
浏览 65
已采纳

如何使用网络爬虫处理安全的cookie [关闭]

I have some tasks on a site in php using nginx that I'm trying to automate. I am able to log in but subsequent requests to the rest of the site fail because of a bunch of cookies I'm not able to capture. When I grab the response header its like they don't exist. All I get is a PHPSESSID and SERVERID, and I'm missing 5 others, although I can see them in my browser cookies. I think only one of them is being used as a persistent authentication token. Ive tried using JSoup, java URL, and lwp/mechanize in PERL. I should be able to get them since burp was written in Java.

http: REMOVED
POST /authenticate.php HTTP/1.1
Host: REMOVED
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.23)
Gecko/20110920 Firefox/3.6.23
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Proxy-Connection: keep-alive
Referer: REMOVED

Cookie: __utma=35782181.1596497020.1319574836.1319750878.1319821717.7; __utmv=35782181.|1=SignupDate=2011-OCT-24=1;uid="MTU5MTY4Ng==|1319649169|e4db70a9171742176a944f4fdc3613fd963b1b7e";username="dGVzdF9sb2dpbg==|1319649169|b82e24618b06d6b14d7ea64600c84a2d20c3de73"; defaultstat1=10; defaultstat3=10; SERVERID=ww4; PHPSESSID=53a7cd9acbb71ed7e7cc7be680e6c99c; __utmb=35782181.1.10.1319821717; __utmc=35782181; mode=full

Content-Type: application/x-www-form-urlencoded
Content-Length: 57
username=test_login&password=login123&btnLogin=Login
HTTP/1.0 302 Moved Temporarily
Server: nginx
Date: Fri, 28 Oct 2011 17:09:08 GMT
Content-Type: text/html
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: secret=99ba70c185973be0cd25e0f12dd1ea72; path=/
Location: REMOVED
X-Cache: MISS from REMOVED
Via: 1.0 REMOVED (http_scan/4.0.2.6.19)
Proxy-Connection: close

JSoup:

Connection.Response res = JSoup.connect(url)
     .data("username", username)
     .data("password", password)
     .method(Method.POST)
    .execute();

cookies[] = res.cookies();

cookies[] only contains PHPSESSID and SERVERID.

  • 写回答

1条回答 默认 最新

  • doudizhi947129 2011-10-28 17:45
    关注

    The cookies in your sample are Google's web analytics cookies, and they're set via Javascript. Unless the crawler you're writing can execute Javascript, those cookies will simply NEVER get set in the crawler.

    What you see in your browser is utterly irrelevant for fixing this - it's what the crawler sees, gets, and can do that counts.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 mmocr的训练错误,结果全为0
  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀