iteye_20402 2014-12-01 19:09
浏览 935
已采纳

java抓取雪球数据时连接老是失败,不知道是否被屏蔽还是参数不对

尝试了好多参数,都是一样的 Server returned HTTP response code: 400 for URL,不知道是不是雪球有限制,但是对照浏览器的请求,一模一样的做了设置也不行,多谢!
也用jsoup做同样的事情,还是同样的错误。
Java代码 收藏代码
[code="java"]package com.test;

import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStream;

import java.io.InputStreamReader;

import java.io.Reader;

import java.net.HttpURLConnection;

import java.net.URL;

import java.net.URLConnection;

import java.nio.charset.Charset;

import org.json.JSONException;

import org.json.JSONObject;

public class test {

private static String readAll(Reader rd) throws IOException {  
    StringBuilder sb = new StringBuilder();  
    int cp;  
    while ((cp = rd.read()) != -1) {  
        sb.append((char) cp);  
    }  
    return sb.toString();  
}  

public static JSONObject readJsonFromUrl(String url) throws IOException,  
        JSONException {  
    URL u = new URL(url);  
    URLConnection uc = (HttpURLConnection)u.openConnection();  
    uc.setRequestProperty("X-Requested-With","XMLHttpRequest");  
    uc.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36 LBBROWSER");  

      // give it 15 seconds to respond  
    uc.setReadTimeout(15*1000);  
    uc.connect();  
    InputStream is = uc.getInputStream();  
    try {  
        BufferedReader rd = new BufferedReader(new InputStreamReader(is,  
                Charset.forName("UTF-8")));  
        String jsonText = readAll(rd);  
        JSONObject json = new JSONObject(jsonText);  
        return json;  
    } finally {  
        is.close();  
    }  
}  

public static void main(String[] args) throws IOException, JSONException {  
    // 设置代理  
    System.getProperties().setProperty("proxySet", "true");  
    System.getProperties().setProperty("http.proxyHost", "cn-proxy.xxx.com");  
    System.getProperties().setProperty("http.proxyPort", "80");  
    JSONObject json = readJsonFromUrl("http://xueqiu.com/stock/cata/stocklist.json?page=1&size=90&order=desc&orderby=name&exchange=CN&industry=%E5%9B%9E%E8%B4%AD&flag=1&_=1417428721184");  
    System.out.println(json.toString());  

}  

} [/code]

引用
[code="java"]Exception in thread "main" java.io.IOException: Server returned HTTP response code: 400 for URL: http://xueqiu.com/stock/cata/stocklist.json?page=1&size=90&order=desc&orderby=name&exchange=CN&industry=%E5%9B%9E%E8%B4%AD&flag=1&_=1417428721184
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241)
at com.test.test.readJsonFromUrl(test.java:37)
at com.test.test.main(test.java:54) [/code]

这个url返回的是一个json文件,内容大概如下,具体可以点击链接查看:
Json代码 收藏代码
[code="java"]{"count":{"count":19.0},"success":"true","stocks":[{"symbol":"SZ395032","code":"395032","name":"债券回购","pettm":"","volume":"395277740","hasexist":"false","marketcapital":"0.0","current":"0.0","percent":"0.0","change":"0.0","high":"0.0","low":"0.0","high52w":"0.0","low52w":"0.0","trading_date":"","trading_days":"","actual_date":"","actual_days":"","net_profit":"","net_profit_day":"","net_profit_yield":"","net_cost":"","net_cost_day":"","net_cost_yield":""},{"symbol":"SH204001","code":"204001","name":"GC001","pettm":"","volume":"370904900","hasexist":"false","marketcapital":"0.0","current":"5.025","percent":"139.29","change":"2.925","high":"7.0","low":"4.0","high52w":"50.5","low52w":"0.1","trading_date":"","trading_days":"","actual_date":"","actual_days":"","net_profit":"","net_profit_day":"","net_profit_yield":"","net_cost":"","net_cost_day":"","net_cost_yield":""}]}[/code]

浏览器的heads:
[code="java"]Remote Address:146.56.234.217:80
Request URL:http://xueqiu.com/stock/cata/stocklist.json?page=1&size=90&order=desc&orderby=name&exchange=CN&industry=%E5%9B%9E%E8%B4%AD&flag=1&_=1417428721184
Request Method:GET
Status Code:200 OK
Request Headersview source
Accept:application/json, text/javascript, /; q=0.01
Accept-Encoding:gzip,deflate,sdch
Accept-Language:zh-CN,zh;q=0.8
Cache-Control:max-age=0
Cookie:bid=2a0ffaa0c8c292e9752b4f52fa2e1a8e_i2zlirov; snbim_minify=true; last_account=35159618%40qq.com; xq_a_token=iXFl2kLorOVMsEDZ78hkeg; xq_r_token=Hs8CSFgNGhjhS6App0McWe; __utmt=1; Hm_lvt_1db88642e346389874251b5a1eded6e3=1417060711,1417146659,1417410467; Hm_lpvt_1db88642e346389874251b5a1eded6e3=1417428721; __utma=1.1126280283.1417060711.1417420694.1417428549.11; __utmb=1.2.10.1417428549; __utmc=1; __utmz=1.1417410467.8.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
Host:xueqiu.com
Proxy-Connection:keep-alive
Referer:http://xueqiu.com/hq
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36 LBBROWSER
X-Requested-With:XMLHttpRequest[/code]

  • 写回答

1条回答 默认 最新

  • zyn010101 2014-12-03 16:07
    关注

    换htmlunit试试,设置一个用户,模拟一个浏览器

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求Houdini使用行家,付费。价格面议。
  • ¥15 AttributeError: 'EasyDict' object has no attribute 'BACKUP_DB_INFO'
  • ¥15 前端高拍仪调用问题报错
  • ¥15 想用octave解决这个数学问题
  • ¥15 Centos新建的临时ip无法上网,如何解决?
  • ¥15 海康威视如何实现客户端软件对设备语音请求的处理。
  • ¥15 支付宝h5参数如何实现跳转
  • ¥15 MATLAB代码补全插值
  • ¥15 Typegoose 中如何使用 arrayFilters 筛选并更新深度嵌套的子文档数组信息
  • ¥15 CF1927D 求Hack