doutan4831 2019-09-07 00:11
浏览 53

与Levenshtein的快速比较[关闭]

I'm trying to find the best way to compare one text (max lenght: 300) against 300.000 with Levenshtein. At the end I need a webservice with a simple REST API. In the future it will be way more entries than 300.000.

In the background I'm using a simple MySQL Database. My first thought was, to use MySQL to do the job. For that I found this: https://github.com/juanmirocks/Levenshtein-MySQL-UDF

But that is way to slow and I tried to implement my idea in Java and PHP. And this is what I got in the worst case (longest text):

MySQL: 70 seconds
Java: 45 seconds
PHP: 17 seconds

Actually PHP sounds not that bad, but it is not that easy to create a webservice with it, which loads my whole table (300.000 entries) into an array and just sits there and waits for some requests to do the job. If I'm wrong, please let me know!

Now I'm looking for any advice or maybe a solution. I thought about to create a webservice in Go. I don't know if it is going to be faster than PHP, but I could create easy a webservice with it.

  • 写回答

1条回答 默认 最新

  • dongtanghuan1885 2019-09-07 00:22
    关注

    Not sure if sticking with the MySQL db is a requirement, but a quick way out is to insert the comparison strings into an ElasticSearch database and simply query with a fuzzy search query (Elastic comes with its own set of APIs that you can just call, see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html).

    评论

报告相同问题?

悬赏问题

  • ¥15 相敏解调 matlab
  • ¥15 求lingo代码和思路
  • ¥15 公交车和无人机协同运输
  • ¥15 stm32代码移植没反应
  • ¥15 matlab基于pde算法图像修复,为什么只能对示例图像有效
  • ¥100 连续两帧图像高速减法
  • ¥15 如何绘制动力学系统的相图
  • ¥15 对接wps接口实现获取元数据
  • ¥20 给自己本科IT专业毕业的妹m找个实习工作
  • ¥15 用友U8:向一个无法连接的网络尝试了一个套接字操作,如何解决?