ds34222222 2015-05-04 19:39
浏览 47
已采纳

组合两个松散连接的MySQL表

So I have two MYSQL tables in a DB which aren't connected (two different data sources) [around 500,000 rows in each]. They have things like Business Name, Business Address, City, State, along with some specific information in each that I hoped to populate a third table with once I've figured out how to 'link' the two together.

My initial thought was to try to link them together if the Business Name / City / Address were the same.

The issue is, for a lot of rows, there isn't an exact match between the two tables. So in one the business name might be 'XYZ' in another 'XYZ Incorporated', same goes for some addresses, it could be '123 NW 10th St.' or '123 Northwest Tenth Street'.

So now I'm trying to figure out a way to find a 'close enough' match, and that's where I'm lost. Never done anything like that before, and wouldn't know where to start. My thoughts were I'd be able to set some kind of threshold where if the match is close enough, I'd insert the two IDs into another table linked the two together for future processing.

(Databases are currently InnoDB / MySQL)

Thanks a lot for any help!

  • 写回答

1条回答 默认 最新

  • duanba4254 2015-05-04 19:47
    关注

    I would suggest creating a View which combines the two tables. In order to find out which fields match most closely, I would recommend using either a "Levenshtein" distance, or something a big smarter like "Jaro/Winkler". I went through something similar to this a while ago and I blogged about it.

    http://dannykopping.com/blog/fuzzy-text-search-mysql-jaro-winkler

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器