请教下面纠结的问题:
我用httpclient,post提交参数,抓取指定检索到的的数据(分多个页面,一个页面一次提交请求),大部分抓取的数据都是全的,大概抓取了100左右页面数据,就开始有一页面的数据不全,断断续续,不全的数据是:没有我指定参数提交的检索到的数据(只有静态页面的数据,即:没有业务数据),但是请求返回来的状态是200,也获取不到异常,纠结了好几天;代码:
public String clientPost(String urll, String htmlbody) {
String[] repParams = htmlbody.split("&");
List data = new ArrayList();
HttpPost post = new HttpPost(urll);
for (String param : repParams) {
data.add(new BasicNameValuePair(param.substring(0,
param.indexOf("=")), param.substring(
param.indexOf("=") + 1, param.length())));
}
try {
// 参数
post.setEntity(new UrlEncodedFormEntity(data, "utf-8"));
// execute post
HttpResponse response = httpClient.execute(post);
if (response.getStatusLine().getStatusCode() == 200) {
a++;
System.out.println("第:" + a + "次请求成功");
HttpEntity entity = response.getEntity();
BufferedReader read = new BufferedReader(new InputStreamReader(
entity.getContent(), "utf-8"));
String currentLine;
System.out.println("entity.getContent:"
+ entity.getContent().toString().length());
StringBuffer buff = new StringBuffer();
while ((currentLine = read.readLine()) != null) {
buff.append(currentLine);
}
System.out.println("buff长度:" + buff.length());
if (buff.length() < 60000) { //数据不全判断
int i = 1;
clientPost(urll, htmlbody);
System.out.println("第" + (a - i) + "页请求了" + (++i) + "次");
} else {
String urlContent = post.getURI().getRawPath()
+ "\r\n"
+ post.getRequestLine().toString()
+ "\r\n"
+ displayInfo(new UrlEncodedFormEntity(data,
"utf-8").getContent());
logContent(urlContent, buff.toString(), a + ".html");
System.out.println("buff:--------------"
+ buff.toString().length());
return buff.toString();
}
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
logNetErr(e);
} catch (ClientProtocolException e) {
e.printStackTrace();
logNetErr(e);
} catch (IOException e) {
e.printStackTrace();
logNetErr(e);
} finally {
post.releaseConnection();
httpClient.getConnectionManager().closeExpiredConnections();
}
return null;
}
还请各位碰到这样的问题指教下: