dongyan2267 2014-03-22 21:29
浏览 47
已采纳

删除htmlentities和whitespaces并替换单引号和双引号的最快方法

I have this code:

$text_without_tags = strip_tags($text);
$text_without_unwanted_characters = preg_replace('/&#?[a-z0-9]{2,8};/i', '', $text_without_tags);
$text_without_spaces = preg_replace('/\s+/', ' ', $text_without_unwanted_characters);
$replace_single_quote = str_replace('’', "'", $text_without_spaces);
$replace_double_quotes = str_replace('”', '"', $replace_single_quote);
$replace_minus = str_replace('—', '-', $replace_double_quotes);

Is this best way to do what I want? Because execution time is veeery long. I have a lot of text but I'm sure this slows result.

LE: I want to remove htmlentities and whitespaces and replace single & double quotes + minus sign.

$text =

<div class="body">&#13;
                                <p>”Sed non risus dictum, tempor leo et, bibendum nunc. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos”. Nulla tincidunt, justo vel hendrerit pellentesque, arcu justo auctor tortor, at venenatis urna nisl at lacus. ’Etiam hendrerit’ lacus eu —augue pellentesque consequat ac non tellus. Vestibulum feugiat posuere cursus. Nulla accumsan purus ligula, vel accumsan nunc tincidunt condimentum. Praesent ac nibh luctus, interdum erat dapibus, adipiscing dui. Nunc tempus turpis eu dolor eleifend, in interdum nisi tempor. Mauris at lacinia tellus, pharetra euismod erat. Phasellus placerat tristique orci, lacinia feugiat purus scelerisque eu. Sed felis neque, cursus eu dictum at, blandit sit amet urna. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin eu malesuada ante. Quisque dui turpis, sagittis eu molestie eget, porta eu tellus. </p>
<p>Â </p>
<p>Â </p>
<p>Â </p>
<p><img title=x" border="0" alt=z" src="http://placehold.it/600x365" width="600" height="365"/></p>
                                                                &#13;
                            </div>
  • 写回答

1条回答 默认 最新

  • douluo7366 2014-03-22 22:02
    关注
    function clean($text) {
      return preg_replace(
        array('/&#?[a-z0-9]{2,8};/i', '/\s+/'),
        array('',' '),
        str_replace(
          array('’','”','—'),
          array('\'','"','-'),
          strip_tags($text)
        )
      );
    }
    
    echo clean($text);
    

    Update :

    You can refactor the code with only single preg_replace like this:

    function clean($text) {
      return preg_replace(
        array('/&#?[a-z0-9]{2,8};/i', '/\s+/','/’/','/”/','/—/'),
        array('',' ','\'','"','-'),
        strip_tags($text)
      );
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 mac mini外接显示器 画质字体模糊
  • ¥15 TLS1.2协议通信解密
  • ¥40 图书信息管理系统程序编写
  • ¥20 Qcustomplot缩小曲线形状问题
  • ¥15 企业资源规划ERP沙盘模拟
  • ¥15 树莓派控制机械臂传输命令报错,显示摄像头不存在
  • ¥15 前端echarts坐标轴问题
  • ¥15 ad5933的I2C
  • ¥15 请问RTX4060的笔记本电脑可以训练yolov5模型吗?
  • ¥15 数学建模求思路及代码