与Levenshtein的快速比较[关闭]

I'm trying to find the best way to compare one text (max lenght: 300) against 300.000 with Levenshtein. At the end I need a webservice with a simple REST API. In the future it will be way more entries than 300.000.

In the background I'm using a simple MySQL Database. My first thought was, to use MySQL to do the job. For that I found this: https://github.com/juanmirocks/Levenshtein-MySQL-UDF

But that is way to slow and I tried to implement my idea in Java and PHP. And this is what I got in the worst case (longest text):

MySQL: 70 seconds
Java: 45 seconds
PHP: 17 seconds

Actually PHP sounds not that bad, but it is not that easy to create a webservice with it, which loads my whole table (300.000 entries) into an array and just sits there and waits for some requests to do the job. If I'm wrong, please let me know!

Now I'm looking for any advice or maybe a solution. I thought about to create a webservice in Go. I don't know if it is going to be faster than PHP, but I could create easy a webservice with it.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongtanghuan1885 2019-09-07 00:22
关注
Not sure if sticking with the MySQL db is a requirement, but a quick way out is to insert the comparison strings into an ElasticSearch database and simply query with a fuzzy search query (Elastic comes with its own set of APIs that you can just call, see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html).

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

用PHP Levenshtein比较5000个字符串 database php
2009-12-24 11:30

回答 8 已采纳 I think a better way to group similar addresses would be to: create a database with two tables -
levenshtein mysql程序替代php mysql php
2014-07-18 13:01

回答 1 已采纳 PHP has a built in levenshtein function. http://php.net/manual/en/function.levenshtein.php
使用PHP和数据库实现Levenshtein database php
2011-12-10 15:01

回答 1 已采纳 For this to work, you'd need three things: A Levenshtein-distance implementation on the MySQL en
Levenshtein Distance编辑距离应用实践——拼写检查(Java fork/join框架实现)
2022-04-18 13:39

每天都要加油呀！的博客 Levenshtein Distance，一般称为编辑距离（Edit Distance，Levenshtein Distance只是编辑距离的其中一种）或者莱文斯坦距离，算法概念是俄罗斯科学家弗拉基米尔·莱文斯坦（Levenshtein · Vladimir I）在1965年提出...
PHP - 优化 - 具有优先级的Levenshtein距离 php
2013-01-25 16:13

回答 1 已采纳 I think the major slowdown in your function is the fact that it's recursive. As I've said in my c
PHP MySQL - Levenshtein替代十进制 mysql php
2012-01-22 03:06

回答 1 已采纳 There are numerous methods to calculate geographic distance between sets of points. Some are accur
在PHP中加速levenshtein / similar_text php
2009-08-01 02:56

回答 3 已采纳 In the end, both levenshtein and similar_text were both too slow with the number of strings it had
Kettle快速入门
2018-09-20 16:49

Kai9527_的博客 levenshtein和Damerau-Levenshtein 编辑一个字符串到另一个字符串需要多少步，返回结果为步骤数清洗——流里的值查询（1）作用：对比参照表和需要清洗的数据，即将参照表和需要清洗的数据进行关联查询 ...
如何智能地比较两个表？ mysql php
2012-11-18 16:05

回答 4 已采纳 I covered this kind of thing when doing a spam detector (loads of research, and then ditched the i
win10中Python安装PyMuPDF报错，如何解决 python
2022-04-30 20:38

回答 2 已采纳我刚刚试了一下。。。你直接把版本带上pip install PyMuPDF==1.18.0就解决了，好像是找不到轮子的问题，不然就是Microsoft Visual C++ 14.0没有装你得先装那个
mySQL显示为数组[关闭] mysql php
2012-07-04 14:05

回答 4 已采纳 Your passing your entire search result set to the levenshtein() function instead of the keyword:
ELK搜索
2022-01-07 17:37

knowledge are power的博客直接基于lucene开发，非常复杂，api复杂 Elasticsearch：基于lucene，封装了许多lucene底层功能，提供简单易用的restful api接口和许多语言的客户端，如java的高级客户端（Java High Level REST Client）和底层...
在没有全局变量的PHP中比较嵌套的foreach变量？ php
2011-04-23 04:24

回答 2 已采纳 foreach ($xml_string->xpath('//location') as $character) { $xml_name = $character-&g
ELK搜索高级
2020-08-05 21:10

Z_海瑞_Z的博客如java的高级客户端（Java High Level REST Client）和底层客户端（Java Low Level REST Client）起源：Shay Banon。2004年失业，陪老婆去伦敦学习厨师。失业在家帮老婆写一个菜谱搜索引擎。封装了lucene的开源项目...
一篇搞定，Kettle详细教程
2023-09-21 11:54

大数据东哥(Aidon)的博客本文主要以Kettle概述、Kettle开发环境部署、mac m1 kettle安装、linux kettle安装、kettle集群安装部署、kettle输入、kettle输出、kettle转换、kettle批量加载、kettle流程、kettle脚本、kettle的Java代码案例、...
【chromium】windows构建base库 3：gn + vs2022 args 设置及debug x86 构建
2024-02-07 10:50

等风来不如迎风去的博客要进行发布构建： is_debug = false 在 Android 上，您可以使用以下命令打开/关闭 ProGuard： is_java_debug = false # Defaults to is_debug. 运行发布版本的 Trybot 启用了 DCHECK，以捕获潜在的错误。 dcheck_...
Java实现任意两段文字对比并输出对比结果到excel文件
2024-01-15 18:00

拥抱AI的博客保留对比结果的新增、修改和删除效果实现的最终效果图如下：二、具体实现步骤需求看似简单，但实现起来有以下几大难点：文字内容比较的算法，如何快速找出并定位一段文本中的内容？在比对出结果后需要采用富...
开发一个基于Dalvik字节码的相似性检测引擎，比较同一款Android应用程序的不同版本之间的代码差异（一）
2019-06-17 21:08

systemino的博客它可以支持已转换为 .dex（即Dalvik Executable）格式的Java应用程序的运行，.dex格式是专为Dalvik设计的一种压缩格式，适合内存和处理器速度有限的系统。Dalvik 经过优化，允许在有限的内存中同时运行多个虚...
lucene详细介绍
2018-09-17 19:41

ZhaoYingChao88的博客 1 lucene简介 1.1 什么是lucene ...它的特点概述起来就是：全Java实现、开源、高性能、功能完整、易拓展，功能完整体现在对分词的支持、各种查询方式（前缀、模糊、正则等）、打分高亮、列式存储（Doc...
华为机试练习代码
2017-08-21 12:44

weixin_30295091的博客现在只能进行0与其他数的交换.将数组中的元素按对应位置放置使得a[i]=i import java.util.*; public class Main { public static void main(String[] args){ Scanner sc = new Scanner(System.in....
没有解决我的问题, 去提问

悬赏问题

¥15 相敏解调 matlab
¥15 求lingo代码和思路
¥15 公交车和无人机协同运输
¥15 stm32代码移植没反应
¥15 matlab基于pde算法图像修复，为什么只能对示例图像有效
¥100 连续两帧图像高速减法
¥15 如何绘制动力学系统的相图
¥15 对接wps接口实现获取元数据
¥20 给自己本科IT专业毕业的妹m找个实习工作
¥15 用友U8：向一个无法连接的网络尝试了一个套接字操作，如何解决？

与Levenshtein的快速比较[关闭]

1条回答 默认 最新

悬赏问题

1条回答默认最新