douquqiang1513 2011-11-24 10:48
浏览 62

拼写检查街道地址的最佳方法是什么?

When importing new addresses to my DB, I do a spellchek to see if the street already exists (the new street is only spelled wrong).

We are currently usingthe Levenshtein method in MySQL query to find similar street names. The problem is street numbers. Having street nmbers in the address really slows down the similarity search / spellcheking.

Example:

Street abc 34
Street abc 37
Street abc 39

These street names is spelled correctly, but the Levenshtein method thinks they are misspelled because of the street numbers.

We have develope a PHP function that takes anything after (and including) the first digit and puts it in another column.

This works fine for streets having the street number at the end, but will not work for countries having the street numbers at the start.

I'm wondering if anyaone else have worked on similar problems?

Update
The solution is for a store locator web site and I'm currently working on the module that will import store lists.

One solution is using Google Maps API and see if it returns geo address.

  • 写回答

3条回答 默认 最新

  • doulan3436 2011-11-24 10:56
    关注

    Uh-oh, generic address is an extremely hard problem. My suggestion is that you perform the minimal amount of validation you can tolerate.

    If this is for shipping purposes, for instance, just use dropdowns for the stuff that's going to determine shipping costs, for example. If you have different shipping costs for different countries, just provide a free-form text area with no validation and a countries dropdown. If the user can't spell their address, tough luck. You can have whomever that handles shipping verify the address "humanly". Delivery companies and post companies mostly can deliver parcels to misspelled addresses (Randomcountry's post company probably knows their street names better than you, anyway).

    If you really need precise addresses, try to find a third-party solution for this. Using Google Maps API might work, and there exist paid solutions for this.

    Considering your algorithm, though, the following solution springs to mind; just use a regex to strip numbers (or even non-letters). However, keep in mind that there are correct street names which are numbers (i.e. NY's 9th Avenue).

    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题