dounieqi6959 2012-03-25 12:51
浏览 34

从html中删除`= `

I have a RoundCube plugin that writes the message body to the database and after that I need to parse the data into another table. By using certain functions in RoundCube I am able to remove all html tags and a </td> is replaced by ' ' and </tr> is replaced by ' '. This make the parsing of my data very easy and robust. There is only one drawback, the html data are broken into fix lines with an = at the end, e.g.:

<td valign=3D"bottom" style=3D"color:#444444;padding:5px 10px 5=
px 0px;font-size:12px;border-bottom:1px solid #eeeeee;"><b>Discount</b></td=
><td valign=3D"bottom" align=3D"right" style=3D"color:#444444;padding:5px 0=
px 5px 0px;font-size:12px;border-bottom:1px solid #eeeeee;text-align:right;=
"><b>Price after discount</b></td>

Now, the </td='s are not getting recognised and therefore the Discount are joined to Price after discount in the following way DiscountPrice after discount , instead of Discount Price after discount . This is all the way through the code and are really causing me severe problems.

I tried to remove the = and break with things like:

$msg_body = str_replace('=', '', $msg_body);
$msg_body = str_replace('=
', '', $msg_body);
$msg_body = str_replace('= ', '', $msg_body);

with no real success. I do not know which type of break comes after the = sign, whether it is a line break or paragraph break and tried to find out, but in vain, even looked at the RoundCube code. Echoing out the html did not revealed anything to me as well.

I post this here as a general php and html question in the hope that someone can help me to simply remove these = sign and the mysterious (to me) breaks so that

</td=
>

becomes

</td>

, etc.

  • 写回答

3条回答 默认 最新

  • dtpoius74857 2012-03-25 12:55
    关注

    depending on the system you're using the new line break can be:

    
    
    
    
    
    

    So check for those ones too

    You can also use regexp, if you know that there is only selected number of markup that have the issue:

    $msg_body = preg_replace('/(\w+)=[\s
    ]*/', '$1', $msg_body);
    

    In your case, it should transform the </td= ...> into <td>

    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题