dqhr76378 2016-08-15 04:07
浏览 55
已采纳

在php [复制]中修剪XML和奇怪的文本

This question already has an answer here:

I'm building RSS feed service, I'm dealing with articles which have unique format like this, I just want to fetch the content, not xml and particular styles or settings, I tried remove image base64 and strip tags and trim multiple spaces, but still there are a lot of weird content right there, how do I sanitize the data so I just get plain text This is paragraph text long content, Another paragraph text long content

<p align="justify"><!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:View>Normal</w:View>
  <w:Zoom>0</w:Zoom>
  <w:TrackMoves></w:TrackMoves>
  <w:TrackFormatting></w:TrackFormatting>
  ...
  </xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
  DefSemiHidden="true" DefQFormat="false" DefPriority="99"
  LatentStyleCount="267">
  <w:LsdException Locked="false" Priority="0" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Normal"></w:LsdException>
  <w:LsdException Locked="false" Priority="9" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="heading 1"></w:LsdException>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"></w:LsdException>
</xml><![endif]--><!--[if gte mso 10]>
<style>
 /* Style Definitions */
 table.MsoNormalTable
    {mso-style-name:"Table Normal";
    mso-tstyle-rowband-size:0;
    mso-tstyle-colband-size:0;
    mso-style-noshow:yes;
mso-bidi-theme-font:minor-bidi;}
</style>
<![endif]-->

<p class="MsoNormal" align="justify">**This is paragraph text long content**</p><p class="MsoNormal" align="justify"> </p><br>

<p class="MsoNormal" align="justify">**Another paragraph text long content**</p>
</div>
  • 写回答

1条回答 默认 最新

  • dongshi949737 2016-08-15 04:53
    关注

    Part of my question was answered at How do you parse and process HTML/XML in PHP

    Extract messy and unwell formatted HTML content can use Simple HTML DOM Parser or relevant script tools.

    Thanks

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 抖音咸鱼付款链接转码支付宝
  • ¥15 ubuntu22.04上安装ursim-3.15.8.106339遇到的问题
  • ¥15 求螺旋焊缝的图像处理
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥15 网络通信安全解决方案
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面
  • ¥15 itunes恢复数据最后一步发生错误
  • ¥15 关于#windows#的问题:2024年5月15日的win11更新后资源管理器没有地址栏了顶部的地址栏和文件搜索都消失了