在PHP中加速levenshtein / similar_text

I am currently using similar_text to compare a string against a list of ~50,000 which works although due to the number of comparisons it's very slow. It takes around 11 minutes to compare ~500 unique strings.

Before running this I do check the databases to see whether it has been processed in the past so everytime after the inital run it's close to instant.

I'm sure using levenshtein would be slightly faster and the LevenshteinDistance function someone posted in the manual looks interesting. Am I missing something that could make this significantly faster?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dpo60833 2009-08-05 05:24
关注
In the end, both levenshtein and similar_text were both too slow with the number of strings it had to go through, even with lots of checks and only using them one of them as a last resort.

As an experiment, I ported some of the code to C# to see how much faster it would be over interperated code. It ran in about 3 minutes with the same dataset.

Next I added an extra field to the table and used the double metaphone PECL extension to generate keys for each row. The results were good although since some included numbers this caused duplicates. I guess I could then have run each one through the above functions but decided not to.

In the end I opted for the simplest approach, MySQLs full text which worked very well. Occasionally there are mistakes although they are easy to detect and correct. Also it runs very fast, in around 3-4 seconds.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(2条)

报告相同问题？

关注问题

levenshtein mysql程序替代php mysql php
2014-07-18 13:01

回答 1 已采纳 PHP has a built in levenshtein function. http://php.net/manual/en/function.levenshtein.php
PHP - 优化 - 具有优先级的Levenshtein距离 php
2013-01-25 16:13

回答 1 已采纳 I think the major slowdown in your function is the fact that it's recursive. As I've said in my c
使用PHP和数据库实现Levenshtein database php
2011-12-10 15:01

回答 1 已采纳 For this to work, you'd need three things: A Levenshtein-distance implementation on the MySQL en
PHP改进计算字符串相似度的函数similar_text()、levenshtein()
2020-10-25 07:45

在上述给定文件中，描述中提到的“改进计算字符串相似度的函数”指的就是为了解决原生`similar_text()`和`levenshtein()`函数对中文汉字支持不好的问题而编写的自定义函数。这个自定义函数的实现思路是，首先将中文...
用PHP Levenshtein比较5000个字符串 database php
2009-12-24 11:30

回答 8 已采纳 I think a better way to group similar addresses would be to: create a database with two tables -
php函数，用于计算字符串中相似字符的数量 php
2014-09-16 12:10

回答 2 已采纳 You can use str_split to convert the strings into arrays, then array_unique and array_intersect to
PHP MySQL - Levenshtein替代十进制 mysql php
2012-01-22 03:06

回答 1 已采纳 There are numerous methods to calculate geographic distance between sets of points. Some are accur
php similar_text()函数的定义和用法
2020-10-22 09:14

- `levenshtein()` 函数在计算速度上可能比 `similar_text()` 快，但在某些情况下，`similar_text()` 可能提供更准确的结果，因为它基于更少的必需修改次数。 - `similar_text()` 不区分大小写，因此在比较时不会...
重定向拼写错误的搜索php mysql php sql
2013-11-28 11:33

回答 1 已采纳 1: You could use SOUNDS LIKE (available as a MySQL string function): SELECT IMDB, Name, Year, Vi
用PHP检查手机输入 php
2013-04-25 08:15

回答 1 已采纳 The most popular predictive texting solutions are proprietary, and I don't think the kind of datab
在没有全局变量的PHP中比较嵌套的foreach变量？ php
2011-04-23 04:24

回答 2 已采纳 foreach ($xml_string->xpath('//location') as $character) { $xml_name = $character-&g
php字符比较函数similar_text、strnatcmp与strcasecmp用法分析
2020-12-19 01:44

值得注意的是，虽然 `levenshtein()` 函数在速度上优于 `similar_text()`，但 `similar_text()` 可能提供更精确的相似度计算，因为它考虑了更少的必要修改次数。示例： ```php $str1 = "hello world"; $str2 =...
php similartext 中文,推荐10款similar_text源码（收藏）
2021-04-21 12:13

呵吁的博客 php similar_text() 函数计算比较两个字符串的相似度，本文章向码农介绍php similar_text() 函数的基本使用方法和基本使用实例，感兴趣的码农可以参考一下。定义和用法 similar_text() 函数计算两个字符串的相似度。...
php similartext 中文,php similar_text()，phpsimilar_text_PHP教程
2021-04-21 12:12

ELAINE TAO的博客 php similar_text()，phpsimilar_textphp similar_text() 函数计算比较两个字符串的相似度，本文章向码农介绍php similar_text() 函数的基本使用方法和基本使用实例，感兴趣的码农可以参考一下。定义和用法similar_...
使用PHP similar text计算两个字符串相似度
2020-10-23 08:46

在PHP编程语言中，`similar_text()`函数是一种用于计算两个字符串相似度的实用工具。它可以帮助开发者在处理文本数据时，评估两个字符串之间的相似性，例如在搜索引擎优化、文本匹配或内容过滤等场景。下面我们将...
php 求相似比,PHP改进计算字符串相似度的函数similar_text()、levenshtein()
2021-04-08 12:58

Ada-苏婉妤的博客 PHP 原生的similar_text()函数、levenshtein()函数对中文汉字支持不好，我自己写了一个，测试使用正常，推荐给大家，如果有什么问题，请留言similar_text()中文汉字版复制代码代码如下://拆分字符串function split_...
similartext php,similar_text - PHP 5 中文文档
2021-04-15 13:51

weixin_39774644的博客 = 3.0.7, PHP 4, PHP 5)similar_text–Calculate the similarity between two stringsDescriptionint similar_text( string first, string second [, float &percent] )This calculates the similarit...
文本相似度php,分析php计算文本字符串相似度函数similar_text()的原理
2021-04-02 08:03

肉依娜娜的博客 PHP有个计算两个文本字符串相似度的函数similar_text()，可以得出一个百分比来表示两个字符串的相似程度。效果如下：similar_text('aaaa', 'aaaa', $percent);var_dump($percent);//float(100)similar_text('aaaa', ...
php 对比字符串相似度,php计算字符串相似度similar_text
2021-03-23 19:45

Gistie的博客因为发送邮件要限制发送频率，有一些邮件都是同类型的邮件，只是时间不一样，这样就...similar_text计算字符串相似度实际上 similar_text 接收3个参数，第3个参数是引用传递，表示相似百分比，函数是返回相似的字节...
similartext php,使用PHPsimilartext计算两个字符串相似度，similartext_PHP教程
2021-04-15 13:50

远方Sep的博客使用PHP similar text计算两个字符串相似度，similartext在网站开发中，我们经常使用php similar text 计算两个字符串相似度;1,similar_text的用法如果我想计算"ly89cn"和"ly89"的相似程度，有两种表示方法代码如下:...
没有解决我的问题, 去提问

悬赏问题

¥20 C＋＋通过HICON获取argb像素数组
¥15 如何利用支持向量机提高分类器正确率和筛选理想分类器
¥15 Pygame坦克大战游戏开发实验报告
¥15 用vmmare虚拟机用sentaurus仿真的时候，调用terminal程序，输入swb指令弹出这个，打不开workbench，桌面上面的sentaurus workbench也打不开
¥75 使用winspool.drv的SetPrinter设置打印机失败
¥15 simulink 硬件在环路仿真
¥15 python动态规划：N根火柴摆出的最大数字
¥20 (标签-excel)
¥200 求idea10和MyEclipse7.1
¥20 vb6.0截取当前窗体保存为jpg文件

在PHP中加速levenshtein / similar_text

3条回答 默认 最新

悬赏问题

3条回答默认最新