dryift6733 2015-06-18 13:54
浏览 74
已采纳

清理MySQL数据的算法

Let's say I have a table of 100,000 MySQL records in a table with 2 columns: title and description. There's also a table containing all the bad words that need to be sanitized.

For e.g. let's say the title column contains the string "Fuck this" and the profanity table says that the "Fuck" string should be replaced with "F***".

Currently I implemented it with a brute force method, but this is way too slow. It checks every single substring from the sentence and compares it with every single string that exists in the profanity filter.

public function sanitizeSiteProfanity($word, $replacement)
{
    $query = $this->_ci->db->select('title, description')->get('top_sites')->result_array();
    $n = $query->num_rows();
    for($i = 0; $i < $n; $i++)
    {
        str_replace($word, $replacement, $query[$i]['title']);
        str_replace($word, $replacement, $query[$i]['description']);
    }   
}

Is there a faster method to sanitize all the substrings?

  • 写回答

2条回答 默认 最新

  • douziqian2871 2015-06-18 14:03
    关注

    I don't know if there is a fast way to sanitize the data. It seems that you have to loop through all the words for the replacement, because one title could have multiple offensive words.

    If you are looking for complete words, a full text index and contains should speed things up. Essentially, you would set up a loop for each of the words and then run:

    update table
        set title = replace(title, 'F***')
        where match (title) against ('Fuck' in boolean mode);
    

    You would need to put this in a stored procedure loop. But, the match() would be quite fast and this would probably significantly speed up the current process.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题