dongyumiao5210 2012-03-22 18:43
浏览 66
已采纳

调整Sphinx匹配任何/部分匹配[通过PHP]

We're running sphinx on a mid-sized product database (10 mm records, 2gb) using the standard EXTENDED2 / SPH_RANK_PROXIMITY_BM25 approach. Speed is great, relevancy is spot on.

However we're running into increasing complaints from end-users who are searching with terms that are more complex than what our database has and thus getting no results.

For example, we have the product "KitchenAid Artisan 5-Quart Mixers" while a common search is "KitchenAid Artisan 5-Quart Stand Mixers brown". The result with our current settings is no match when we should be able to return the item we have.

We've tried using the MATCH_ANY sorting by @weight mode but relevancy goes completely sideways [think dolls and board games showing up] as sphinx picks up other products with individual words.

Is there a best practice way to build our query parameters that will allow for more open matching while still ranking off of proximity and word density?

Here is our current PHP API commands if that helps

$cl = new SphinxClient();
$cl->SetServer('1.23.4', 456);
$cl->SetMaxQueryTime(15000);
$cl->SetMatchMode(SPH_MATCH_EXTENDED2);
$cl->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$cl->SetArrayResult(true);
$cl->SetFilter('active', array(1)); 
$cl->SetSortMode(SPH_SORT_RELEVANCE, '@weight DESC, priced ASC');
$cl->SetLimits(intval($try), 1, 20, 500);
$cl->SetFieldWeights(array('ptitle' => 60, 'description' => 40));
$res = $cl->query($searchterm,"products");
  • 写回答

1条回答 默认 最新

  • doulanyan6455 2012-03-22 20:36
    关注

    One thing to explore is Quorum. This can be useful for long queries as you can require a certain number of keywords. While ANY will only require one word to match, quorum can require say 4 out of 7.

    This will rule out a number of really bad matches right off.

    And because quorum is just a syntax as part of extended match mode - you can try all the different ranking modes. Using SPH_RANK_MATCHANY is still available to try - as it should be reasonably good with 'partial' matches. But you can also try the other modes.

    If you are using morphology, you can also enable index_exact_words and give them a boost in the rankings.

    So would do something like ...

    //this works as long as the user is not using special syntax, but if using -="() etc, need to be more clever
    $bits = preg_split('/\s+/',trim($searchterm));
    $quorum = ceil(count($bits)*0.66);
    $searchterm2 = '='.implode(' =',$bits);
    
    $searchterm = '"'.$searchterm.'"/'.$quorum.' | "'.$searchterm2.'"/'.$quorum;
    

    Also, I have doubts about your setLimits. max_matches of 20 seems very low. And the cutoff looks unnecessary; it might even be causing your issues. It will find 500 reasonable documents, and then stop searching - even if there are better matches later in the dataset.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 linux驱动,linux应用,多线程
  • ¥20 我要一个分身加定位两个功能的安卓app
  • ¥15 基于FOC驱动器,如何实现卡丁车下坡无阻力的遛坡的效果
  • ¥15 IAR程序莫名变量多重定义
  • ¥15 (标签-UDP|关键词-client)
  • ¥15 关于库卡officelite无法与虚拟机通讯的问题
  • ¥15 目标检测项目无法读取视频
  • ¥15 GEO datasets中基因芯片数据仅仅提供了normalized signal如何进行差异分析
  • ¥100 求采集电商背景音乐的方法
  • ¥15 数学建模竞赛求指导帮助