This question already has an answer here:
I'm building RSS feed service, I'm dealing with articles which have unique format like this, I just want to fetch the content, not xml and particular styles or settings, I tried remove image base64 and strip tags and trim multiple spaces, but still there are a lot of weird content right there, how do I sanitize the data so I just get plain text This is paragraph text long content, Another paragraph text long content
<p align="justify"><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves></w:TrackMoves>
<w:TrackFormatting></w:TrackFormatting>
...
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"></w:LsdException>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"></w:LsdException>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"></w:LsdException>
</xml><![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-bidi-theme-font:minor-bidi;}
</style>
<![endif]-->
<p class="MsoNormal" align="justify">**This is paragraph text long content**</p><p class="MsoNormal" align="justify"> </p><br>
<p class="MsoNormal" align="justify">**Another paragraph text long content**</p>
</div>