dongyanghan0556 2011-12-10 15:01
浏览 70
已采纳

使用PHP和数据库实现Levenshtein

I have a search form. If the user makes a typo like ager instead of anger, it should still show the relevant results instead of displaying 0 results found.

I came across the PHP levenshtein function and the example that they have given with array is exactly what I want [except that the user can input a sentence rather than one word], but I would like to implement it with database, but have no idea as to how go about implementing it with database.

This is my code:

if(!empty($search))
{
    try {
        $query = $this->_db->prepare($sql);
        $query->execute();
        if(!$query->rowCount()==0)
        {
            $foundRows = $this->_db->query("SELECT FOUND_ROWS()")->fetchColumn();
            while($row = $query->fetch(PDO::FETCH_ASSOC))
            {
                $cQuote =  $this->highlightWords(htmlspecialchars($row['cQuotes']),$search);
                $search_result[] = array('success' => true, 'totalRows' => $foundRows, 'cQuotes' => $cQuote, 'vAuthor' => $this->h($row['vAuthor']), 'vBookName' => $this->h($row['vBookName']), 'vRef' => $this->h($row['vRef']));
            }
            $response = json_encode($search_result);
            echo $response;
            return TRUE;
        }
        else
        {
            $ex =  "No results found for " .$search;
            $this->errorMsg($ex);
        }
        $query->closeCursor();
    }
    catch (Exception $ex){
        $ex =  "Problem: " .$ex;
        $this->errorMsg($ex);
    }
}
else
{
    $ex =  "Please enter something";
    $this->errorMsg($ex);
}

I should add that I'm using MySQL + PDO.

  • 写回答

1条回答 默认 最新

  • douwu8060 2011-12-10 16:28
    关注

    For this to work, you'd need three things:

    An example database schema:

    text

    +---------+----------------------------------------------+
    | text_id | text                                         |
    +---------+----------------------------------------------+
    |       1 | The quick brown fox jumps over the lazy dog  |
    |       2 | The slow brown foxes jump over the lazy dogs |
    +---------+----------------------------------------------+
    

    word

    +-------+---------+
    | word  | text_id |
    +-------+---------+
    | fox   |       1 |
    | foxes |       2 |
    | dog   |       1 |
    | dogs  |       2 |
    +-------+---------+
    

    Once you have that, say someone searches for "foxs dogg", you'd build a query like this one:

    SELECT text FROM text
        LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
        LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "dogg") < 3
        GROUP BY text.text_id
        HAVING COUNT(*) = 2
    

    ...where:

    • Each word has a LEFT JOIN (e.g.: foxs and dogg)
    • You have an HAVING clause that contains the total number of words (e.g.: HAVING COUNT(*) = 2)
    • The maximum distance for each word is specified (e.g.: LEVENSHTEIN(...) < 3)

    The above would return both entries.

    Here's another example:

    SELECT text FROM text
        LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
        LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "slows") < 3
        GROUP BY text.text_id
        HAVING COUNT(*) = 2
    

    The above would return only text_id = 2.

    Now, before you go crazy implementing this, you should know that multiple JOIN clauses, like the above, on a table having millions of entries (words), will have a very big performance impact.

    While this is a working example, you really should look for an already implemented search algorithm, like Solr's SpellCheck component.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题