weixin_33709219 2018-06-19 15:36 采纳率: 0%
浏览 502

使用HtmlUnit调用Ajax

I want to crawl web page, this page has a download button, when I press it current page show me download progress in title and then show me download link which can be pressed. I think its done via Ajax because I can see some in developer console -> Network ->XHR

This my code to crawl site

 WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setCssEnabled(true);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        final HtmlPage page = webClient.getPage("https://9xbuddy.com/process?url=https://www.fembed.com/v/6mv22g3qfsdfsd");
        //  final ScriptResult scriptResult = page.executeJavaScript("beacon.js");
        webClient.waitForBackgroundJavaScript(10000);
        webClient.waitForBackgroundJavaScriptStartingBefore(10000);

But this code return me page which I get after button click and don't load Ajax. I know which Ajax requests were made by site, is it any way to manually call Ajax requests?

  • 写回答

1条回答 默认 最新

  • weixin_33725722 2018-06-20 05:25
    关注

    You can construct the Ajax calls manually with HtmlUnit, if you find that the Google Chrome console is not sufficient, you can use a tool such as Fiddler. Once you have identified the HTTP call, you can reconstruct it with HTMLUnit like below

    URL url = new URL(
            "http://tws.target.com/searchservice/item/search_results/v1/by_keyword?callback=getPlpResponse&navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to");
    WebRequest requestSettings = new WebRequest(url, HttpMethod.GET);
    
    requestSettings.setAdditionalHeader("Accept", "*/*");
    requestSettings.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
    requestSettings.setAdditionalHeader("Referer", "http://www.target.com/c/xbox-one-games-video/-/N-55krw");
    requestSettings.setAdditionalHeader("Accept-Language", "en-US,en;q=0.8");
    requestSettings.setAdditionalHeader("Accept-Encoding", "gzip,deflate,sdch");
    requestSettings.setAdditionalHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
    
    Page page = webClient.getPage(requestSettings);
    
    System.out.println(page.getWebResponse().getContentAsString());
    
    评论

报告相同问题?

悬赏问题

  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误