如何智能地比较两个表？

Consider following are my two sql tables:

Table 1 Table 2

+-------+-------------------------+   +-------+------------------------------+
| USD   | Model                   |   | USD   | Model                        |
+-------+-------------------------+   +-------+------------------------------+
| 700   | iPad 2 WiFi 16GB        |   | 710   | iPad2 WiFi 16GB              |
| 400   | iPhone 4S 16GB          |   | 450   | iPhone4S 16GB                |
| 250   | iPod Touch(4th Gen)8GB  |   | 200   |iPod Touch 4th Generation 8GB |
+-------+-------------------------+   +-------+------------------------------+

I am stuck in comparing the data present in two different tables intelligently. I dug alot on the context of searching or comparing and I found

similar_text()
soundex()
metaphones()
LEVENSHTEIN()
like
fulltext
regexp

...in PHP and MySQL but they all are not efficient. Because similar_text and LEVENSHTEIN are really good, but the worst drawback is that they are extremely slow for 1000 rows, soundex() and metaphones return the same sound for such items which are not alike, like "iphone" and "ipad", both are not same etc. All I want to do is compare efficiently two rows that are alike like from the above example "iPhone 4S 16GB" and "iPhone4S 16GB" is the same or the like mentioned above and my solution should work quickly to compare such rows. Kindly let me know what are my options for comparing so I can solve my query. I would really appreciate any idea, any hint.

Note: My one table contains around ~900 rows.

This is a continuation of:

Compare two arrays and sort WRT USD

Pattern comparing with mysql between two tables column

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dscuu86620 2012-11-18 17:20
关注
I covered this kind of thing when doing a spam detector (loads of research, and then ditched the idea later, but moving on...).

Basically, do not use like, it's slow on large text and indexes are limited for example:

LIKE '%hello' can not use an index, but, LIKE 'hello%' can. Also, large fields will result in large indexs to make the work as you intend (they are ok for say email addresses which tend to be short).

Use = which will also be case insensitive which you must have for this.

Next, add a new field to the tables which contains the already parsed metaphone() representation (this means that it only has to be calculated ONCE).

Now you have a table of say 1000 records, each with their metaphone version as well as the original. You MUST do this to get the efficiency you require. When you want to see if some text already exists, you just convert the new text to it's metephone version, then search the db tables for it (searching on the metephone parsed field). Much quicker ;)

To improve accuracy, you may want to delete all the common words and remove punctuation such as:

and = deleted

, = deleted

' = deleted

has = deleted

it's = its or it is (depending on which you prefer)

Then combine all multiple whitespace such as 5 spaces, into just 1 space.

The nature of what you are doing will have hundreds of little tweaks you can do to perfect it for what you need it for.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(3条)

报告相同问题？

关注问题

Hibernate中一个类怎么映射两个表？ hibernate
2017-05-10 01:08

回答 4 已采纳表关联起来。它本身就可以自动实现映射关联
Java 两个LocalDate求问怎么比较先后？ java
2019-08-16 19:38

回答 1 已采纳 isAfter或isBefore方法， ``` LocalDate date1; public boolean verify(LocalDate date2){ if(date
如何在Laravel中使用模型连接两个数据库表？ laravel mysql php
2019-02-28 10:52

回答 3 已采纳 you don't even want to create StudentDetailsto get data using the code you have used. the model S
海量大数据平台的运维智能化实践
2021-01-27 13:30

介绍Tesla如何支撑阿里离线计算和实时计算两大海量大数据平台的标准化日常运维运营，以及探索如何构筑运维领域的知识图谱，打造针对大数据平台和大数据业务的数据化全息投影，实现多维的立体化监控、智能决策分析、...
两个Integer比较不能使用==吗？ intellij-idea java java-ee
2020-04-06 15:26

回答 3 已采纳可以使用，这个报错是警告，并不影响运行。如果Integer中存储的为-128~127之间的数字，使用==是可以正常判断的，并且当两个数值相等时显示为true；但是当integer内存储的数字超出这个范
如何在golang中比较两个文件？
2015-04-08 02:52

回答 4 已采纳 I am not sure that function does what you think it does. From the docs, Unless shallow is giv
C#如何比较两个文本哪里不一样？ c#
2017-12-04 06:56

回答 3 已采纳如果是不同的的话就是文件名称不同，你代码解析程序的路径应该是固定的，你自己看看代码路径怎么写的
人工智能。大数据与复杂系统课件.zip
2020-08-05 11:02

08-高等数学—两个重要的极限定理 09-高等数学—导数 10-贝叶斯理论 11-高等数学—泰勒展开 12-高等数学—偏导数 13-高等数学—积分 14-高等数学—正态分布 15-朴素贝叶斯和最大似然估计 16-线 17-数据科学和统计学...
JAVA 两个数据库的某个表数据同步 jar java 数据挖掘
2019-08-30 11:02

回答 8 已采纳三种方案： 1.数据库触发器在A电脑C表中建立触发器，然后将C表的增删改记录下来，放到一个中间表中，中间表中记录动的是哪个表，什么时间，id之类的数据，然后写个跑批服务，定时去拉取中间表数据
MYSQL 两个表联合查询比较绕 mysql
2016-03-04 07:19

回答 8 已采纳我的整个SQL语句是这样的： select i.id,i.basic_info,i.detailed_info,s.detailes,s.date from people_info i,people
C#比较两个excel表格的差异 c#
2018-10-13 12:49

回答 2 已采纳用NPOI遍历两个Execl，以行对象中的你需要比较的数据做对比，看看是否是新增，然后按你的需求做修改
大数据和人工智能有什么关系？
2021-05-05 13:10

Shockang的博客本文详解了大数据和人工智能之间的区别，同时简单的介绍了人工智能和机器学习
echart图表插件如何同时显示两个label
2017-08-23 10:31

回答 2 已采纳 http://echarts.baidu.com/demo.html#bar-rich-text 是这个吧？这段代码可以改，改了点运行可以看效果。 ```
大数据？什么是大数据（大数据的概念）？大数据的价值？
2018-02-24 15:18

KarenChia的博客声明：本文转至Big大鸟的博客下，转载的名为《什么叫大数据 大数据的概念》一文，链接地址http://blog.csdn.net/qq_36738482/article/details/728235091、大数据定义对于“大数据”（Big data）研究机构Gartner给...
什么是“大数据新闻”？ 大数据
2018-10-23 11:16

数据工程师陈晨的博客那么，究竟什么是大数据呢，大数据新闻又是一个什么概念？很多初学者，对大数据的概念都是模糊不清的，大数据是什么，能做什么，学的时候，该按照什么线路去学习，学完往哪方面发展，想深入了解，想学习的同学欢迎...
没有解决我的问题, 去提问

悬赏问题

¥15 像这种代码要怎么跑起来？
¥15 怎么改成循环输入删除(语言-c语言)
¥15 安卓C读取/dev/fastpipe屏幕像素数据
¥15 pyqt5tools安装失败
¥15 mmdetection
¥15 nginx代理报502的错误
¥100 当AWR1843发送完设置的固定帧后，如何使其再发送第一次的帧
¥15 图示五个参数的模型校正是用什么方法做出来的。如何建立其他模型
¥100 描述一下元器件的基本功能，pcba板的基本原理
¥15 STM32无法向设备写入固件

如何智能地比较两个表？

4条回答 默认 最新

悬赏问题

4条回答默认最新