dongyin2390 2016-07-23 16:39
浏览 72
已采纳

JavaScript函数同步抓取HTML和JS

Is there a library that would support synchronous JavaScript functions like the following?

function getPageHTML(url){
     // scrape HTML from external web page
     return html;
}

function getPageJS(url){
     // scrape final JavaScript variable results from external web page
     return js;
}

I like the concept behind pjscrape, but don't want to use command-line script. I don't mind using PHP, but I want my function to be synchronous.

  • 写回答

1条回答 默认 最新

  • dongyue7796 2016-07-23 17:31
    关注

    There is no Javascript environment where it is recommended to use synchronous networking to retrieve data from some external server. This is just not how Javascript is designed. Javascript is designed to use asynchronous I/O where the result will be returned via a promise or a callback and cannot be returned directly from your function call.

    The "A" in "Ajax" stands for asynchronous. That is a cornerstone of making networked requests from Javascript in the browser. The browser can technically do a synchronous Ajax call, but that is not recommended for a variety of reasons (like it hangs the UI in the browser during the call) and it is being deprecated in many circumstances too because it's almost never a good idea to use synchronous ajax. In addition Ajax calls from the browser are limited to either the same origin that your web page came from or to servers that explicitly allow cross origin requests. So, you can't expect to make an Ajax call to fetch any arbitrary page on the internet. You won't be able to fetch most other pages from a browser web page Ajax call.

    What the browser is good at is asynchronous networking where the result is returned asynchronous via a callback or promise sometime in the future and the rest of your Javascript continues to run until then. This is how you should code your access to network requests.

    If you want to get scraped results in a browser from some external site, the preferred architecture for that would be to set up a server that will do the work for you. Your Javascript in your web page will make an Ajax call to your own server asking it to scrape a specific web site. The server (which has no cross origin limitations on what hosts it can make requests from) will then fetch the content, scrape it into the desired results and then return the resulting scraped data to your Ajax call.


    So, you could design a promise based interface in your client that could work asynchronously like this:

    getPageJS(someUrl).then(function(data) {
        // process data here
    }).catch(function(err) {
        // process error here
    });
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 shape_predictor_68_face_landmarks.dat
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制