dsjojts9734 2011-10-28 16:56
浏览 65
已采纳

如何使用网络爬虫处理安全的cookie [关闭]

I have some tasks on a site in php using nginx that I'm trying to automate. I am able to log in but subsequent requests to the rest of the site fail because of a bunch of cookies I'm not able to capture. When I grab the response header its like they don't exist. All I get is a PHPSESSID and SERVERID, and I'm missing 5 others, although I can see them in my browser cookies. I think only one of them is being used as a persistent authentication token. Ive tried using JSoup, java URL, and lwp/mechanize in PERL. I should be able to get them since burp was written in Java.

http: REMOVED
POST /authenticate.php HTTP/1.1
Host: REMOVED
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.23)
Gecko/20110920 Firefox/3.6.23
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Proxy-Connection: keep-alive
Referer: REMOVED

Cookie: __utma=35782181.1596497020.1319574836.1319750878.1319821717.7; __utmv=35782181.|1=SignupDate=2011-OCT-24=1;uid="MTU5MTY4Ng==|1319649169|e4db70a9171742176a944f4fdc3613fd963b1b7e";username="dGVzdF9sb2dpbg==|1319649169|b82e24618b06d6b14d7ea64600c84a2d20c3de73"; defaultstat1=10; defaultstat3=10; SERVERID=ww4; PHPSESSID=53a7cd9acbb71ed7e7cc7be680e6c99c; __utmb=35782181.1.10.1319821717; __utmc=35782181; mode=full

Content-Type: application/x-www-form-urlencoded
Content-Length: 57
username=test_login&password=login123&btnLogin=Login
HTTP/1.0 302 Moved Temporarily
Server: nginx
Date: Fri, 28 Oct 2011 17:09:08 GMT
Content-Type: text/html
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: secret=99ba70c185973be0cd25e0f12dd1ea72; path=/
Location: REMOVED
X-Cache: MISS from REMOVED
Via: 1.0 REMOVED (http_scan/4.0.2.6.19)
Proxy-Connection: close

JSoup:

Connection.Response res = JSoup.connect(url)
     .data("username", username)
     .data("password", password)
     .method(Method.POST)
    .execute();

cookies[] = res.cookies();

cookies[] only contains PHPSESSID and SERVERID.

  • 写回答

1条回答 默认 最新

  • doudizhi947129 2011-10-28 17:45
    关注

    The cookies in your sample are Google's web analytics cookies, and they're set via Javascript. Unless the crawler you're writing can execute Javascript, those cookies will simply NEVER get set in the crawler.

    What you see in your browser is utterly irrelevant for fixing this - it's what the crawler sees, gets, and can do that counts.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab
  • ¥15 求lingo代码和思路
  • ¥15 公交车和无人机协同运输
  • ¥15 stm32代码移植没反应
  • ¥15 matlab基于pde算法图像修复,为什么只能对示例图像有效
  • ¥100 连续两帧图像高速减法
  • ¥15 如何绘制动力学系统的相图