douyuai8994 2011-04-06 20:21
浏览 11
已采纳

使用PHP获取HTML输出(清理文本)

do you know if there is any function (PHP) which clean up some HTML code (got with cURL) and filter the visible text (the one the browser is going to show). Thanks

  • 写回答

3条回答 默认 最新

  • duanhe6718 2011-04-06 20:25
    关注

    This is harder than you'd think. An obvious simple solution is to run strip_tags() over it, but that would simply remove tags and leave all text content intact, including embedded javascript and CSS, as well as all text inside elements that are normally hidden (e.g. by setting display: none on them). You could try some regex magic to filter out the parts you're not interested in, but regular expressions on HTML are generally a bad idea for anything nontrivial. The ultimate solution is, I'm afraid, to use a proper HTML parser and then pull the actual text out of the resulting DOM tree - by the time you have that, you'll be pretty close to implementing a web browser.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统