douba3975 2013-11-28 11:33
浏览 68
已采纳

重定向拼写错误的搜索php

I have a website where I index information about movies. To find a movie the user can input the title of the movie and this query will be sent to the database:

SELECT IMDB, Name, Year, Views 
FROM Movies 
WHERE Name LIKE '%$search%'

I used the "similar_text" function to solve little mistakes. For example if the name is "Pulp Fiction" and the user types: "Pulp Foction", since I don't get any results I run this little code for every movie.

            similar_text($search, $Name, $percent);

            if ($percent > $ValMax) {
                $ValMax = $percent;
                $PosMax = $i;   
            }

I'm using this code since I only have a couple hundred movies in the database and in the future it's going to be at max about twenty thousand. Also every movie as a title of an average of 20 characters, without considering movies like Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb

The real problem starts with a query used by an user: "Capitan America" (italian for Captain America) where the name of the movie was "Captain America – Il primo vendicatore"

Using the Levenshtein function I got horrible results where with the similar_text they were at least decent. You can see the full results of that query here. I made up this page to see for each search what the levenshtein and the similar_text would be.

The movie that would have been selected if I didn't place a 60% blockage would have been "C'era una volta in America" (Once upon a time in America).

However "Capitan America" and "Captain America" are pretty similar so I was wondering if there was a way to maybe check for each word. I've also heard about Lucene but I didn't know how to start from and If it was suited for my needs.

Thanks :)

  • 写回答

1条回答 默认 最新

  • dsyak22488 2013-11-28 11:51
    关注

    1:

    You could use SOUNDS LIKE (available as a MySQL string function):

    SELECT IMDB, Name, Year, Views FROM Movies WHERE Name SOUNDS LIKE '%$search%'
    

    You might want to benchmark the performance of this on large tables as the preceding wildcard in '%$search%' means that your statement will not be able to avail of any indexes.

    2:

    Another possible solution involves creating a custom levenshtein function for MySQL. Here, you can find an example of one:

    CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) ) 
      RETURNS INT 
      DETERMINISTIC 
      BEGIN 
        DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; 
        DECLARE s1_char CHAR; 
        -- max strlen=255 
        DECLARE cv0, cv1 VARBINARY(256); 
        SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; 
        IF s1 = s2 THEN 
          RETURN 0;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 matlab有关常微分方程的问题求解决
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿
  • ¥15 回答4f系统的像差计算
  • ¥15 java如何提取出pdf里的文字?
  • ¥100 求三轴之间相互配合画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable