dqzg62440 2014-04-18 08:29
浏览 75
已采纳

在PHP中匹配名称

I have a huge database with product-names. Before inserting a new product into the database I'd like to match the product onto the database to find out, if it already exists (i.e. get the IDs of the entries which are the same or very very similar) which are the same, but have a different description, e.g.:

  • iphone 4s
  • i-phone 4s
  • iphone-4s

I don't need to automatically match those entries, I only want to generate matching-suggestions and then let them be supervised.

I have some ideas about it. Regard ONE single product name for which I'd like to find the relating entry in the database, e.g. "apple iphone-4s". My DB could look like:

  1. iphone-4s
  2. galaxy s4
  3. iphone 3g
  4. apple nano
  5. samsung anything 4s

  1. Replace special chars like "-", "," etc with a space (apple iphone-4s -> apple iphone 4s), then explode the string, making it to array('iphone', '4s'), then looping over each entry in this array and match it to one product-name from the database and count the total number of hits. Results: Matching apple iphone 4s <=> array('apple', 'iphone', '4s') to

    • iphone-4s gives 2 hits
    • galaxy s4 gives 0 hits
    • iphone 3g gives 1 hit
    • apple nano gives 1 hit
    • samsung anything 4s gives 1 hit
  2. sort those matches for the most hits, i.e. iphone-4s is the most likely match to suggest to the supervisor.

  3. Maybe as addition it would make sense to remove all spaces and special chars from the names already stored in the database, because of the following scenario: My new product name could be apple iphone and the stored database name would e.g. be apple i-phone. So there would be only one hit instead of two. Removing every non-alphanumeric character from the already stored one would possibly increase the hitrates. In this example, the stored database entry would become appleiphone, so after exploding the new productname apple iphone, there would be two hits.
  4. As yet another addition I thought of possibly removing stuff like colors etc from all names before matching them as I don't care about them and I'd like to match two products no matter which color they have...

Do you have better ideas?

  • 写回答

1条回答 默认 最新

  • doulan0297 2014-04-18 08:33
    关注

    You may want to consider levenshtein distance function:

    http://www.php.net/manual/en/function.levenshtein.php

    This is what natural text search engines use to get you similar results to the words you type in. I don't know how you can support this in mysql, but I know I used this quite well with solr indexes. Hope this helps.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 蓝桥oj3931,请问我错在哪里
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染