weixin_39597399
2020-12-28 14:30 阅读 1

Match threshold

Could you add an optional parameter to find that allows you to set a match threshold? I am looking for matches that are only really close vs. getting matches that are .2 or .3 in similarity. By setting the threshold I could eliminate anything that isn't almost exactly a match.

该提问来源于开源项目:seamusabshere/fuzzy_match

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

8条回答 默认 最新

  • weixin_39847244 weixin_39847244 2020-12-28 14:30

    hey i'll look into this ASAP

    点赞 评论 复制链接分享
  • weixin_39847244 weixin_39847244 2020-12-28 14:30

    hey we're trying out a similar feature in #3

    
    gem 'fuzzy_match', github: 'seamusabshere/fuzzy_match', branch: 'find_all_with_score'
    

    you get 2 scores for every record (because sometimes pair distance aka dice's coefficient can't tell things apart)

    
    fz = FuzzyMatch.new [...]
    fz.find_all_with_score('foobar').each do |record, dice_similar, leven_similar|
      [...]
    end
    

    it returns all scores, so you can do the threshold yourself:

    
    fz.find_all_with_score('foobar').select do |record, dice_similar, leven_similar|
      dice_similar > 0.3
    end
    

    is this sufficient for your needs? if so, i'll put it in a new gem release

    点赞 评论 复制链接分享
  • weixin_39597399 weixin_39597399 2020-12-28 14:30

    Hi Seamus,

    Yup that would serve my purpose though I think it would be cleaner if find took an optional threshold param and then on line 264 in fuzzy_match.rb you could change the "> 0" to "> threshold". Either way works though.

    Thanks, Danyal

    On Wed, Jan 9, 2013 at 7:40 AM, Seamus Abshere notifications.comwrote:

    hey https://github.com/danyal we're trying out a similar feature in #3 https://github.com/seamusabshere/fuzzy_match/issues/3

    gem 'fuzzy_match', github: 'seamusabshere/fuzzy_match', branch: 'find_all_with_score'

    you get 2 scores for every record (because sometimes pair distance aka dice's coefficient can't tell things apart)

    fz = FuzzyMatch.new [...] fz.find_all_with_score('foobar').each do |record, dice_similar, leven_similar| [...] end

    it returns all scores, so you can do the threshold yourself:

    fz.find_all_with_score('foobar').select do |record, dice_similar, leven_similar| dice_similar > 0.3 end

    is this sufficient for your needs? if so, i'll put it in a new gem release

    — Reply to this email directly or view it on GitHubhttps://github.com/seamusabshere/fuzzy_match/issues/4#issuecomment-12049657.

    点赞 评论 复制链接分享
  • weixin_39847244 weixin_39847244 2020-12-28 14:30

    hey how's this lookin' 8e11cfe0628c15b309a1f8a3137f5ba8544ed51d

    点赞 评论 复制链接分享
  • weixin_39597399 weixin_39597399 2020-12-28 14:30

    Looks great! Thanks so much.

    On Wed, Jan 9, 2013 at 6:29 PM, Seamus Abshere notifications.comwrote:

    hey https://github.com/danyal how's this lookin' 8e11cfehttps://github.com/seamusabshere/fuzzy_match/commit/8e11cfe0628c15b309a1f8a3137f5ba8544ed51d

    — Reply to this email directly or view it on GitHubhttps://github.com/seamusabshere/fuzzy_match/issues/4#issuecomment-12077940.

    点赞 评论 复制链接分享
  • weixin_39847244 weixin_39847244 2020-12-28 14:30

    can you use this as a github branch? if not, i can rush a gem release.

    点赞 评论 复制链接分享
  • weixin_39597399 weixin_39597399 2020-12-28 14:30

    I can use the branch, no need to rush.

    On Wed, Jan 9, 2013 at 7:47 PM, Seamus Abshere notifications.comwrote:

    https://github.com/danyal can you use this as a github branch? if not, i can rush a gem release.

    — Reply to this email directly or view it on GitHubhttps://github.com/seamusabshere/fuzzy_match/issues/4#issuecomment-12079549.

    点赞 评论 复制链接分享
  • weixin_39847244 weixin_39847244 2020-12-28 14:30

    fixed since version 1.4.1 i believe (https://github.com/seamusabshere/fuzzy_match/commit/c2e6f3e3eb0aef442e0c28ef2ffb8a1536b6f442)

    点赞 评论 复制链接分享

相关推荐