duanmiaosi0150 2013-11-21 11:10
浏览 27

too long

I have the following data

<description>&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div class="MsoNormal"&gt;&lt;i&gt;&lt;span style="font-family: Georgia, Times New Roman, serif; font-size: xx-small;"&gt;By Marina Correa&lt;/span&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;i&gt;&lt;span style="font-family: Georgia, Times New Roman, serif; font-size: xx-small;"&gt;Photography: Courtesy the architect&lt;/span&gt;&lt;span style="font-family: Georgia, serif; font-size: 9pt;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-D1JRy4epwOM/UooCcR-U7lI/AAAAAAAALyM/tDr2ezxnb-I/s1600/Prost_Beer_+House_AH_Design_Indiaartndesign.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img alt="Prost Beer House in Bengaluru, India,by AH design." border="0" src="http://3.bp.blogspot.com/-D1JRy4epwOM/UooCcR-U7lI/AAAAAAAALyM/tDr2ezxnb-I/s1600/Prost_Beer_+House_AH_Design_Indiaartndesign.jpg" title=""&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: right;"&gt;&lt;span style="font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"&gt;.&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="MsoNormal"&gt;&lt;br&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-family: Georgia, &amp;#39;Times New Roman&amp;#39;, serif;"&gt;Evolving from carnage of shipwrecked metal, the interiors of Prost Beer House in Bengaluru, India, make it an attention-grabbing drinking hole…&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;a href="http://inditerrain.indiaartndesign.com/2013/11/beerhouse-rock.html#more"&gt;Read more »&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/IndiaArtNDesign/~4/jGC75D3KB0o" height="1" width="1"/&gt;</description>

however instead of "<" i have "& lt;" and instead of ">" i have "& gt;"

i need a regular expression to find the data not inside the html tags ie the actual text and not the names of the tags, class name etc...

for parsing the html with "<" and ">" i found this: (?<=^|>)[^><]+?(?=<|$)

although i dont know how to convert it to suit what i need. help is much appreciated

  • 写回答

4条回答 默认 最新

  • dsxfa26482 2013-11-21 11:14
    关注

    for decoding you can user htmlspecialchars_decode

    for more detail please check http://php.net/manual/en/function.htmlspecialchars-decode.php

    评论

报告相同问题?

悬赏问题

  • ¥15 对于知识的学以致用的解释
  • ¥50 三种调度算法报错 有实例
  • ¥15 关于#python#的问题,请各位专家解答!
  • ¥200 询问:python实现大地主题正反算的程序设计,有偿
  • ¥15 smptlib使用465端口发送邮件失败
  • ¥200 总是报错,能帮助用python实现程序实现高斯正反算吗?有偿
  • ¥15 对于squad数据集的基于bert模型的微调
  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 CST保存项目时失败