x_i_a_o_b_a_i 2017-10-18 11:45 采纳率: 0%
浏览 4442
已结题

Java反反爬虫问题,已经拿到__jsl_clearance的值了但还是没法抓到数据,求大神解答

代码如下:
@org.junit.Test
public void test8() throws ClientProtocolException, IOException{

    CloseableHttpClient client = HttpClients.createDefault();
    //设置代理
    HttpHost proxy = new HttpHost("118.114.77.47", 8080, "http");  
    RequestConfig config = RequestConfig.custom().setProxy(proxy).build();  
    HttpGet get=new HttpGet("http://www.cnvd.org.cn/manufacturer/manufacturerListByStartWord?startWord=A");
    System.out.println(config);
    //模拟浏览器
    get.setConfig(config);
    get.setHeader("Accept", "Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");  
    get.setHeader("Accept-Encoding", "gzip, deflate");  
    get.setHeader("Accept-Language", "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3");  
    get.setHeader("Connection", "keep-alive");  
    get.setHeader("Host", "www.cnvd.org.cn");  
    get.setHeader("referer", "http://www.cnvd.org.cn/");  
    get.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"); 
    get.setHeader("Upgrade-Insecure-Requests", "1"); 

    CloseableHttpResponse response = client.execute(get);
    //拿到第一次请求返回的JS
    if(response.getStatusLine().getStatusCode()==521){
        HttpEntity entity = response.getEntity();
        String html=EntityUtils.toString(entity,"utf-8");
        System.out.println(html);
        //处理从服务器返回的JS,并执行
        String js=html.trim().replace("<script>", "").replace("</script>", "").replace("eval(y.replace(/\\b\\w+\\b/g, function(y){return x[f(y,z)-1]}))","y.replace(/\\b\\w+\\b/g, function(y){return x[f(y,z)-1]})");
        V8 runtime = V8.createV8Runtime();
        String result=runtime.executeStringScript(js);
        System.out.println(result);
        //第二次处理JS并执行  var cd,dc  var l=function(){
        result=result.substring(result.indexOf("var cd"),result.indexOf("dc+=cd;")+7);
        //result="var l=function(){ "+result+"return dc;}";
        System.out.println(result);

        result = result.replaceAll("document*.*toLowerCase\\(\\)", "'x'");
        String __jsl_clearance=runtime.executeStringScript(result);

        System.out.println(__jsl_clearance);
        runtime.release();
        org.apache.http.Header[] Cookies=response.getHeaders("Set-Cookie");
        System.out.println(Cookies[0].getValue().split(";")[0]);
        get.setHeader("Cookie",Cookies[0].getValue().split(";")[0]+";"+__jsl_clearance);
        //get.setHeader("Cookie",__jsl_clearance);

    }
    response=client.execute(get);
    //拿到最终想要的页面
    HttpEntity entity = response.getEntity();
    String res=EntityUtils.toString(entity,"utf-8");
    System.err.println(res);

    //return res;

}


输出的还是一个js
  • 写回答

1条回答 默认 最新

  • qq_35008324 2017-11-24 04:07
    关注

    请问你这个V8引擎器怎么添加上去的?

    评论

报告相同问题?

悬赏问题

  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)