我的代码如下:
strUrl = "http://www.tlnews.cn/dzb/tlrb/html/2016-04/15/node_164.html";
public static String getUrlStr(String strUrl, String charSet){
String urlStr = "";
try {
URL url = new URL(strUrl);
URLConnection uc = url.openConnection();
uc.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
uc.setRequestProperty("Connection", "Keep-Alive");
uc.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
uc.connect();
InputStream is = uc.getInputStream();
InputStreamReader isr = new InputStreamReader(is, charSet);
BufferedReader br = new BufferedReader(isr);
StringBuffer strs = new StringBuffer();
String str;
while ((str = br.readLine()) != null) {
strs.append(str + "\r\n");
}
urlStr = strs.toString();
isr.close();
br.close();
} catch (Exception e) {
e.printStackTrace();
}
return urlStr;
}
一开始是报403错误,后来加上了uc.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");这句,开始报521错误。我要抓取的链接是http://www.tlnews.cn/dzb/tlrb/html/2016-04/15/node_164.html
错误信息:
java.io.IOException: Server returned HTTP response code: 521 for URL: http://www.tlnews.cn/dzb/tlrb/html/2016-04/14/node_164.html
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at wy.base.test.TestURL.getUrlStr(TestURL.java:32)
at wy.base.test.TestURL.main(TestURL.java:14)