dongxiji0687 2012-11-18 16:05
浏览 44
已采纳

如何智能地比较两个表?

Consider following are my two sql tables:

Table 1                                                                       Table 2

+-------+-------------------------+   +-------+------------------------------+
| USD   | Model                   |   | USD   | Model                        |
+-------+-------------------------+   +-------+------------------------------+
| 700   | iPad 2 WiFi 16GB        |   | 710   | iPad2 WiFi 16GB              |
| 400   | iPhone 4S 16GB          |   | 450   | iPhone4S 16GB                |
| 250   | iPod Touch(4th Gen)8GB  |   | 200   |iPod Touch 4th Generation 8GB |
+-------+-------------------------+   +-------+------------------------------+

I am stuck in comparing the data present in two different tables intelligently. I dug alot on the context of searching or comparing and I found

  • similar_text()
  • soundex()
  • metaphones()
  • LEVENSHTEIN()
  • like
  • fulltext
  • regexp

...in PHP and MySQL but they all are not efficient. Because similar_text and LEVENSHTEIN are really good, but the worst drawback is that they are extremely slow for 1000 rows, soundex() and metaphones return the same sound for such items which are not alike, like "iphone" and "ipad", both are not same etc. All I want to do is compare efficiently two rows that are alike like from the above example "iPhone 4S 16GB" and "iPhone4S 16GB" is the same or the like mentioned above and my solution should work quickly to compare such rows. Kindly let me know what are my options for comparing so I can solve my query. I would really appreciate any idea, any hint.

Note: My one table contains around ~900 rows.

This is a continuation of:

Compare two arrays and sort WRT USD

Pattern comparing with mysql between two tables column

  • 写回答

4条回答 默认 最新

  • dscuu86620 2012-11-18 17:20
    关注

    I covered this kind of thing when doing a spam detector (loads of research, and then ditched the idea later, but moving on...).

    Basically, do not use like, it's slow on large text and indexes are limited for example:

    LIKE '%hello' can not use an index, but, LIKE 'hello%' can. Also, large fields will result in large indexs to make the work as you intend (they are ok for say email addresses which tend to be short).

    Use = which will also be case insensitive which you must have for this.

    Next, add a new field to the tables which contains the already parsed metaphone() representation (this means that it only has to be calculated ONCE).

    Now you have a table of say 1000 records, each with their metaphone version as well as the original. You MUST do this to get the efficiency you require. When you want to see if some text already exists, you just convert the new text to it's metephone version, then search the db tables for it (searching on the metephone parsed field). Much quicker ;)

    To improve accuracy, you may want to delete all the common words and remove punctuation such as:

    • and = deleted
    • , = deleted
    • ' = deleted
    • has = deleted
    • it's = its or it is (depending on which you prefer)

    Then combine all multiple whitespace such as 5 spaces, into just 1 space.

    The nature of what you are doing will have hundreds of little tweaks you can do to perfect it for what you need it for.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥15 想问一下树莓派接上显示屏后出现如图所示画面,是什么问题导致的
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化