PHP / MySQL - 查找具有相似或匹配属性的项目

I'm trying to develop a way of taking an entity with a number of properties and searching for similar entities in the database (matching as many of the properties in the correct order as possible). The idea is that it would then return a % of how similar it is.

The order of the properties should also be taken into account, so the properties at the beginning are more important than the ones at the end.

For example:

Item 1 - A, B, C, D, E

Item 2 - A, B, C, D, E

Would be a 100% match

Item 1 - A, B, C, D, E

Item 2 - B, C, A, D, E

This wouldn't be a perfect match as the properties are in a different order

Item 1 - A, B, C, D, E

Item 2 - F, G, H, I, A

Would be a low match as only one property is the same and it is in position 5

This algorithm will run for thousands and thousands of records so it needs to be high performing and efficient. Any thoughts as to how I could do this in PHP/MySQL in a fast and efficient manner?

I was considering levenshtein but as far as I can tell that would also look at the distance between two completely different words in terms of spelling. Doesn't appear to be ideal for this scenario unless I'm just using it in the wrong way..

It might be that it could be done solely in MySQL, perhaps using a full text search or something.

This seems like a nice solution, though not designed for this scenario. Perhaps binary comparison could be used in some way?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongwu9063 2011-04-25 08:44
关注
what i'd do is encode the order and property value into a number. numbers have the advantage of fast comparisons.

this is a general idea and may still need some work but i hope it would help in some way.

calculate a number (some form of hash) for each property and multiply the number representative of the order of appearance the property for an item.

say item1 has 3 properties A, B and C.

hash(A) = 123, hash(B) = 345, hash(C) = 456

then multiply that by the order of appearance given that we have a know number of properties:

(hash(A) * 1,000,00) + (hash(B) * 1,000) + (hash(C) * 1) = someval

magnitude of the multiplier can be tweaked to reflect your data set. you'll have to identify the hash function. soundex maybe?

the problem is now reduced to a question of uniqueness due to hash collisions but we can be pretty sure about properties that don't match.

also, this would have the advantage of relative ease of checking if a property appears in another item in different order by using the magnitude of the multiplier to extract the hash value from the number generated.

HTH.

edit: example for checking matches

given item1(a b c) and item2(a b c). the computed hash of items would be equal. this is a best case scenario. no further computations are required.

given item1(a b c) and item2(d e a). computed hash of items are not equal. proceed to breaking down property hashes...

say a hash table for properties a = 1, b = 2, c = 3, d = 4, e = 5 with 10^n for multiplier. computed hash for item1 is 123 and item2 is 451, break down the computed hash for each property and compare for all combinations of properties one for each item1 (which becomes item1(1 2 3) ) and item2 (which becomes item2(4 5 1) ). then compute the score.

another way of looking at it would be comparing the properties one by one, except this time, you're playing with numbers instead of the actual string values

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

centos6安装mysql5.7出错：yum install mysql-community-server出错 linux mysql
2019-07-25 00:56

回答 4 已采纳报的什么错呢, 可以参考下https://www.jianshu.com/p/7b8c4dea6829
PHP / MYSQL-如何使用IN查询 mysql php sql
2014-04-18 05:11

回答 3 已采纳 If you are passing 48,52 as comma separated numbers as a string to IN then search is not performed
解决cmake报错：CMake Error: The source directory "/root/mysql" does not appear to contain CMakeLists.txt. centos mysql 有问必答
2022-03-06 11:21

回答 2 已采纳包下载错了，需要下载源码包。 [CentOS6.5]CMake Error: The source directory /data does not appear to
PHP之mysql面试题大全(58持续更新中)
2023-09-26 15:20

PHP隔壁老王邻居的博客排好序的快速查找的数据结构死锁（Deadlock）是指两个或多个事务相互等待对方所持有的资源，导致所有事务都无法继续执行的情况。简单来说，就是两个或多个事务在彼此等待对方释放资源，从而陷入了无法继续的僵持状态...
logstash启动出现Faild to elasticsearch
2020-03-05 17:31

回答 1 已采纳版本不合适。把版本换成mysql-connector-java-8.0.15.jar吧。
mysql安装mysqld --initialize -console什么都没有显示 mysql 数据库
2022-07-03 07:40

回答 1 已采纳 1)重启2)看看自己的visual C++ 2015要不要跟新
在php mysql中查找具有相似名称的多个表 mysql php
2013-03-19 08:51

回答 1 已采纳 The answer to this is that you can't. When using JOIN the table name must be present in full, you
数据库MySQL详解
2018-07-24 20:03

砖业洋__的博客全网最详细MySQL教程，2023持续更新中
[mysql-cj-abandoned-connection-cleanup] ？ java maven mysql tomcat
2020-07-13 21:59

回答 1 已采纳 https://blog.csdn.net/www646288178/article/details/79391940/
Mysql为什么在命令行输入mysql -version查看不了版本呢？ mysql
2022-01-07 23:26

回答 3 已采纳不只是 mysql -v 这种方法可以查看mysql的版本，但是-version 肯定是没有的。至于其他的查看方法，可参考查看mysql版本的四种方法 - 风生水起 -
CDH集群安装 yum -y install MySQL-python不成功 java mysql 大数据
2023-01-12 11:05

回答 1 已采纳如果您在Centos8上安装MySQL-python时遇到“No match for argument: MySQL-python”错误，可能是由于以下原因：您尝试安装的MySQL-python软件
大数据学习总结（2021版）---Mysql基础
2021-03-04 17:10

亿钱君的博客这里写目录标题第一章：数据库1.1 数据库概述1.2 数据库表1.3 表数据第二章：MySql数据库2.1 MySql启动和停止2.2 登录MySQL数据库2.3 SQLyog（MySQL图形化开发工具）2.4 MySQL配置文件第三章：SQL语句3.1 SQL分类3.2...
关于mysql-proxy的问题数据库
2013-04-10 15:50

回答 1 已采纳 660是默认权限表示可读写但不可执行改成chmod 777 /etc/mysql-proxy.cnf 让其有所有权限
编程之路之数据库mysql（六）- SELECT查询语句详解
2020-05-24 17:15

小咖成长之路的博客一、SELECT查询语句配置 SELECT [选项子句] 字段表达式子句 [from子句] [where子句] [group by子句] [having子句] [order by子句] [limit子句] ...字段或函数调用可以使用别名，如 Studname as name Count(*) as n
【檀越剑指大厂--mysql】mysql高阶篇
2022-08-17 15:26

Kwan的解忧杂货铺的博客文章目录一.Mysql 基础 1.数据库与实例? 2.mysql 的配置文件 3.mysql 体系结构 4.innodb 的特点? 5.innodb 和 myisam 的区别 6.其他存储引擎? 7.什么是物理日志和逻辑日志? 8.什么是异步 IO? 9.QPS 和 TPS 二....
没有解决我的问题, 去提问

悬赏问题

¥50 求解vmware的网络模式问题
¥24 EFS加密后，在同一台电脑解密出错，证书界面找不到对应指纹的证书，未备份证书，求在原电脑解密的方法，可行即采纳
¥15 springboot 3.0 实现Security 6.x版本集成
¥15 PHP-8.1 镜像无法用dockerfile里的CMD命令启动只能进入容器启动，如何解决？(操作系统-ubuntu)
¥30 请帮我解决一下下面六个代码
¥15 关于资源监视工具的e-care有知道的嘛
¥35 MIMO天线稀疏阵列排布问题
¥60 用visual studio编写程序，利用间接平差求解水准网
¥15 Llama如何调用shell或者Python
¥20 谁能帮我挨个解读这个php语言编的代码什么意思？

PHP / MySQL - 查找具有相似或匹配属性的项目

2条回答 默认 最新

悬赏问题

2条回答默认最新