dryift6733 2015-06-18 13:54
浏览 74
已采纳

清理MySQL数据的算法

Let's say I have a table of 100,000 MySQL records in a table with 2 columns: title and description. There's also a table containing all the bad words that need to be sanitized.

For e.g. let's say the title column contains the string "Fuck this" and the profanity table says that the "Fuck" string should be replaced with "F***".

Currently I implemented it with a brute force method, but this is way too slow. It checks every single substring from the sentence and compares it with every single string that exists in the profanity filter.

public function sanitizeSiteProfanity($word, $replacement)
{
    $query = $this->_ci->db->select('title, description')->get('top_sites')->result_array();
    $n = $query->num_rows();
    for($i = 0; $i < $n; $i++)
    {
        str_replace($word, $replacement, $query[$i]['title']);
        str_replace($word, $replacement, $query[$i]['description']);
    }   
}

Is there a faster method to sanitize all the substrings?

  • 写回答

2条回答 默认 最新

  • douziqian2871 2015-06-18 14:03
    关注

    I don't know if there is a fast way to sanitize the data. It seems that you have to loop through all the words for the replacement, because one title could have multiple offensive words.

    If you are looking for complete words, a full text index and contains should speed things up. Essentially, you would set up a loop for each of the words and then run:

    update table
        set title = replace(title, 'F***')
        where match (title) against ('Fuck' in boolean mode);
    

    You would need to put this in a stored procedure loop. But, the match() would be quite fast and this would probably significantly speed up the current process.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 微带串馈天线阵列每个阵元宽度计算
  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥30 BC260Y用MQTT向阿里云发布主题消息一直错误
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM
  • ¥15 划分vlan后不通了
  • ¥20 用雷电模拟器安装百达屋apk一直闪退