dqhr76378 2016-08-15 04:07
浏览 55
已采纳

在php [复制]中修剪XML和奇怪的文本

This question already has an answer here:

I'm building RSS feed service, I'm dealing with articles which have unique format like this, I just want to fetch the content, not xml and particular styles or settings, I tried remove image base64 and strip tags and trim multiple spaces, but still there are a lot of weird content right there, how do I sanitize the data so I just get plain text This is paragraph text long content, Another paragraph text long content

<p align="justify"><!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:View>Normal</w:View>
  <w:Zoom>0</w:Zoom>
  <w:TrackMoves></w:TrackMoves>
  <w:TrackFormatting></w:TrackFormatting>
  ...
  </xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
  DefSemiHidden="true" DefQFormat="false" DefPriority="99"
  LatentStyleCount="267">
  <w:LsdException Locked="false" Priority="0" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Normal"></w:LsdException>
  <w:LsdException Locked="false" Priority="9" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="heading 1"></w:LsdException>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"></w:LsdException>
</xml><![endif]--><!--[if gte mso 10]>
<style>
 /* Style Definitions */
 table.MsoNormalTable
    {mso-style-name:"Table Normal";
    mso-tstyle-rowband-size:0;
    mso-tstyle-colband-size:0;
    mso-style-noshow:yes;
mso-bidi-theme-font:minor-bidi;}
</style>
<![endif]-->

<p class="MsoNormal" align="justify">**This is paragraph text long content**</p><p class="MsoNormal" align="justify"> </p><br>

<p class="MsoNormal" align="justify">**Another paragraph text long content**</p>
</div>
  • 写回答

1条回答 默认 最新

  • dongshi949737 2016-08-15 04:53
    关注

    Part of my question was answered at How do you parse and process HTML/XML in PHP

    Extract messy and unwell formatted HTML content can use Simple HTML DOM Parser or relevant script tools.

    Thanks

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 求数学坐标画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 自己瞎改改,结果现在又运行不了了
  • ¥15 链式存储应该如何解决
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站