dongque4778 2012-03-05 23:01
浏览 115

php cURL登录jsp网站并返回HTML

I'm trying to use cURL to log into a jsp/tomcat website (we'll call it https://unknown.com for privacy reasons) and return the HTML from a page. I've observed the Net panel in firebug and the cookie panel with Firecookie to outline the manual the steps below:

  1. Open web root - https://unknown.com
  2. Redirected to https://unknown.com/common/frames.jsp -Cookie Created: JSESSIONID
  3. Fill out j_username and j_password
  4. Post "j_username=user&j_password=pass&submit=logon" to https://unknown.com/common/j_security_check
  5. Redirect to https://unknown.com/common/frames.jsp
  6. User selects link from home page where the HTML to be return is.

So basically I don't have a lot of experience with cURL and I'm not having much luck, I really just need to start off with understanding the steps that cURL will require to log in to the site and go to the destination page.

EDIT: Here is my code:

//user login information
$username = "user";
$password = "pass";

$postData = "j_username=".$username."&j_password=".$password."&logon=submit";

$cookie_file = "/tmp/curl_cookies.txt";

//$fp = fopen($cookie_file, "w");
//fclose($fp);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://unknown.com/common/j_security_check');
curl_setopt($ch, CURLOPT_POSTFIELDS,$postData);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_REFERER, "https://unknown.com/common/Frames.jsp");
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = curl_exec($ch);
curl_close($ch);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://unknown.com/claritymatch/ClarityBatchViewer.jsp?id=123');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = curl_exec($ch);

curl_close($ch);
echo $data;

It doesn't work when I first run the .php file, but the second time it brings up the destination HTML - how can I get it to just bring it up the first time? Also, since I'm storing the JSESSIONID cookie in the file indicated above, wont I run into problems with that session id not changing or will it change as needed?

  • 写回答

1条回答 默认 最新

  • douzi1991 2012-03-06 00:26
    关注

    Here are a few suggestions for your situation...

    • Re-use the same curl handle for simplicity
      This reduces the need to duplicate options for each request. Set the majority of your options at the beginning and do it only once. I refer mostly to cookie options, user-agent, follow-location etc.
      You can then set the URL and request method for each individual request you make.
      You can even gain additional performance by adding a Keep-Alive header to your request so if the remote server supports it, the same connection will be used to make multiple requests without having to reconnect each time.

    • Set CURLOPT_FOLLOWLOCATION to true and start from the beginning
      Try to follow exactly what you see the browser do. That is, request the web root; if the site redirects you to the security check URL, cURL will follow that redirect and capture any cookies set in the process. One cURL request can result in multiple HTTP requests if a redirect is sent. Then proceed to "fill out" the login form.

    • Use http_build_query() for your post data
      There is nothing wrong with the way you set up your post string, but the data must be url-encoded. Using http_build_query() with an array is easier to manipulate and will result in an url-encoded string you can feed directly to cURL.

    See also this answer I posted a couple of days ago for a person trying to do something similar. I also posted a few references to some other answers that contain full samples of requesting multiple URLs using cURL; just looking at those answers should help you get an idea of how to do what you want. Especially see this answer which was the first reference in the post I mentioned as it shows how to log into Google by making several post requests and finally a get request.

    评论

报告相同问题?

悬赏问题

  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于#flink#的问题:关于docker部署flink集成hadoop的yarn,请教个问题flink启动yarn-session.sh连不上hadoop
  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 lammps拉伸应力应变曲线分析
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥15 请问Lammps做复合材料拉伸模拟,应力应变曲线问题