拼写检查街道地址的最佳方法是什么?

将新地址导入我的数据库时,我会进行拼写检查以查看街道是否已存在(新街道是 只有拼错了。)</ p>

我们目前正在使用MySQL查询中的Levenshtein方法来查找类似的街道名称。 问题是街道号码。 在地址中设置街道名称确实会降低相似性搜索/拼写检查的速度。</ p>

示例:</ p>


Street abc 34


Street abc 37

Street abc 39 </ p>
</ blockquote>

这些街道名称拼写正确,但Levenshtein方法认为由于街道号码拼写错误 。</ p>

我们开发了一个PHP函数,它接收(包括)第一个数字后的任何内容,并将其放在另一个列中。</ p>

这个工作正常 对于最后有街道号码的街道,但不适用于开头有街道号码的国家。</ p>

我想知道是否还有其他人处理类似问题?</ p>

更新</ strong>

该解决方案适用于商店定位器网站,我目前正在处理将导入商店列表的模块。</ p> \ n

一种解决方案是使用Google Maps API并查看它是否返回地理地址。</ p>
</ div>

展开原文

原文

When importing new addresses to my DB, I do a spellchek to see if the street already exists (the new street is only spelled wrong).

We are currently usingthe Levenshtein method in MySQL query to find similar street names. The problem is street numbers. Having street nmbers in the address really slows down the similarity search / spellcheking.

Example:

Street abc 34
Street abc 37
Street abc 39

These street names is spelled correctly, but the Levenshtein method thinks they are misspelled because of the street numbers.

We have develope a PHP function that takes anything after (and including) the first digit and puts it in another column.

This works fine for streets having the street number at the end, but will not work for countries having the street numbers at the start.

I'm wondering if anyaone else have worked on similar problems?

Update
The solution is for a store locator web site and I'm currently working on the module that will import store lists.

One solution is using Google Maps API and see if it returns geo address.

3个回答



呃哦,通用地址是一个非常难的问题。 我的建议是你执行你可以容忍的最小量的验证。</ p>

如果这是出于运输目的,例如,只需使用下拉列表来确定运输成本, 例。 如果您针对不同的国家/地区使用不同的运费,则只需提供不带验证的自由格式文本区域和国家/地区下拉列表。 如果用户不能拼写他们的地址,运气不好。 您可以让任何处理运输的人验证地址“人性化”。 送货公司和邮政公司大多可以将包裹送到错误拼写的地址(无论如何,Randomcountry的邮局公司可能比你更了解他们的街道名称。)</ p>

如果你真的需要精确的地址,试着找一个 第三方解决方案。 使用Google Maps API可能会有效,并且存在付费解决方案。</ p>

考虑到您的算法,请考虑以下解决方案; 只需使用正则表达式来删除数字(甚至是非字母)。 但是,请记住,有正确的街道名称​​是</ em>号码(即纽约第9大道)。</ p>
</ div>

展开原文

原文

Uh-oh, generic address is an extremely hard problem. My suggestion is that you perform the minimal amount of validation you can tolerate.

If this is for shipping purposes, for instance, just use dropdowns for the stuff that's going to determine shipping costs, for example. If you have different shipping costs for different countries, just provide a free-form text area with no validation and a countries dropdown. If the user can't spell their address, tough luck. You can have whomever that handles shipping verify the address "humanly". Delivery companies and post companies mostly can deliver parcels to misspelled addresses (Randomcountry's post company probably knows their street names better than you, anyway).

If you really need precise addresses, try to find a third-party solution for this. Using Google Maps API might work, and there exist paid solutions for this.

Considering your algorithm, though, the following solution springs to mind; just use a regex to strip numbers (or even non-letters). However, keep in mind that there are correct street names which are numbers (i.e. NY's 9th Avenue).

dphw5101
dphw5101 亚历克斯,谢谢你的反馈。 请看我的上述更新。 我没有考虑数字街道地址。 我现在所做的是在我的商店注册表单中有一个单独的输入字段。 但是从列表导入商店时仍然会产生问题。
8 年多之前 回复



您可以将APi用于Fedex,UPS,USPS并验证地址。 这是针对许多电子商务网站提供的送货地址...这就是为什么有时你会看到</ p>

“你的意思是这个地址”...... </ p>

< p>您也可以使用Google Maps的api。</ p>
</ div>执行此操作

展开原文

原文

You can use the APi for Fedex, UPS, USPS, and validate an address. this is done for lots of eCommerce sites for shipping addresses... that's why sometimes you might see

"Did you mean this address"...

You can also do this with Google Maps's api.

duanli9569
duanli9569 谢谢你的提示。 当我在Adobe订购PS时,我刚收到你的意思是这个地址的建议。 我们也在研究Google API - 但免费版本有一些限制。
8 年多之前 回复



这是一个非常常见的问题。 例如,您可以拥有多个地址,这些地址都代表相同的物理位置,但结构不同。 例如:</ p>

100 North 250 West
100 North 250W
100 North 250 W
100N 250 West
100 N 250 West
100 North 250 West </ p>

根据美国邮政服务,标准化地址为100 N 250 W.只有通过将每个地址解析为标准格式,您才能准确删除重复数据,并能够确保一致的结果。</ p> \ n

如果没有一些额外的背景,地址很难标准化。 我所指的上下文是该国家所有有效/可交付地址的最新主列表。 这实际上不是以列表格式提供的(它会很大),但可以作为API访问。 美国邮政服务提供他们的API,还有其他公司采用USPS数据并通过他们自己的API增强它。 增强功能通常是更快的服务和有保证的正常运行时间以及额外的地址处理功能和返回的有关地址的更多数据。</ p>

因此,快速回答,在街道上进行拼写检查的最佳方法 地址将是使用API​​来验证完整地址。</ p>

为了充分披露,我是 SmartyStreets 我们确实解决了验证问题。 如果您是一家非营利组织,您可以免费使用我们的服务。 有几个地址验证公司 - 只需谷歌搜索“地址验证”,你会找到一堆。</ p>
</ div>

展开原文

原文

This is a very common problem. For example, you can have multiple addresses that all represent the same physical location but are structured differently. For example:

100 North 250 West 100 North 250W 100 North 250 W 100N 250 West 100 N 250 West 100 North 250 West

According to the US Postal Service, the standardized address is 100 N 250 W. Only by resolving each of these addresses to a standardized format would you be able to accurately remove duplicates and be able to ensure consistent results.

Addresses are extremely difficult to standardize without some additional context. The context that I am referring to is an up-to-date master list of all the valid/deliverable address in the country. This is not actually available in a list format (it would be huge) but is available to access as an API. The US Postal Service makes their API available and there are other companies that take the USPS data and enhance it through their own API. The enhancements are typically faster service and guaranteed uptime as well as additional address processing functions and more data returned about the address.

So, in quick answer, the best way to do spellcheck on a street address would be to use an API to validate the full address.

In the interest of full disclosure, I'm the founder of SmartyStreets and we do address verification. If you are a nonprofit organization, you can use our services at no charge. There are several address verification companies out there--just do a Google search for "address verification" and you'll find a bunch.

douqiaolong0528
douqiaolong0528 Thansk的反馈意见。 目前,我们已为每个国家/地区创建了配置文件,以便我们检查一些规则。 第一个版本只有斯堪的纳维亚地址的规则。 我们将仔细研究如何在导入工具的第2版中扩展它。
8 年多之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问