douji9518 2009-11-20 06:34
浏览 80
已采纳

非浏览器模拟JavaScript - 是否可能?

I have a new project I am working on that involves fetching a webpage, (using PHP and cURL) parsing the HTML and javascript out of it and then handling the data in the results.

Basically I hit a brick wall when the site uses javascript to fetch its data by AJAX. In this case, the initial data will not appear in the fetched page unless the javascript is run in a browser.

Are there any PHP libraries for this? (I suspect not, but I could be wrong.)

I would really rather build this as a server-based solution, otherwise I am forced to have to build an application for this and using mozilla and/or IE runtime libraries - which kind of defeats the purpose.

  • 写回答

8条回答 默认 最新

  • duanbage2161 2009-11-20 14:03
    关注

    You will need:

    • one JavaScript interpreter
    • one DOM Level 2 Core and HTML implementation
    • 500g of non-standard but commonly-used DOM extensions
    • a pinch of DOM Level 2 Style (which might mean also a CSS interpreter and layout engine)
    • yoghurt pots, round-ended scissors and sticky-back plastic

    Once you have assembled your components (remember to get a grown-up to help you with the sandboxing), you'll find what you have is essentially indistinguishable from a web browser.

    JAVA is not part of the shell build on the server. V8/SquirrelFish is C++ code I would need to convert to PHP.

    Porting a JS engine to PHP would be a huge task, and the resulting performance likely horrible. You can't even really get away with a nearly-solution on JavaScript any more, since so many pages are using hideously complex libraries like jQuery to do everything, which will require in-depth JS support.

    I don't think you're going to be able to do this purely in PHP. You'll have to hook up Java/Rhino/HTMLUnit or a proper web browser like Mozilla. If your hosting environment doesn't give you the flexibility you need to compile and deploy that sort of thing, you'd have to move to a better hosting setup with a shell (preferably VPS).

    If you can avoid this unpleasantness some other way, by special-casing known pages' AJAX access, do that.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(7条)

报告相同问题?

悬赏问题

  • ¥15 线程问题判断多次进入
  • ¥15 msix packaging tool打包问题
  • ¥28 微信小程序开发页面布局没问题,真机调试的时候页面布局就乱了
  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败