dongyumiao5210 2012-03-22 18:43
浏览 66
已采纳

调整Sphinx匹配任何/部分匹配[通过PHP]

We're running sphinx on a mid-sized product database (10 mm records, 2gb) using the standard EXTENDED2 / SPH_RANK_PROXIMITY_BM25 approach. Speed is great, relevancy is spot on.

However we're running into increasing complaints from end-users who are searching with terms that are more complex than what our database has and thus getting no results.

For example, we have the product "KitchenAid Artisan 5-Quart Mixers" while a common search is "KitchenAid Artisan 5-Quart Stand Mixers brown". The result with our current settings is no match when we should be able to return the item we have.

We've tried using the MATCH_ANY sorting by @weight mode but relevancy goes completely sideways [think dolls and board games showing up] as sphinx picks up other products with individual words.

Is there a best practice way to build our query parameters that will allow for more open matching while still ranking off of proximity and word density?

Here is our current PHP API commands if that helps

$cl = new SphinxClient();
$cl->SetServer('1.23.4', 456);
$cl->SetMaxQueryTime(15000);
$cl->SetMatchMode(SPH_MATCH_EXTENDED2);
$cl->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$cl->SetArrayResult(true);
$cl->SetFilter('active', array(1)); 
$cl->SetSortMode(SPH_SORT_RELEVANCE, '@weight DESC, priced ASC');
$cl->SetLimits(intval($try), 1, 20, 500);
$cl->SetFieldWeights(array('ptitle' => 60, 'description' => 40));
$res = $cl->query($searchterm,"products");
  • 写回答

1条回答 默认 最新

  • doulanyan6455 2012-03-22 20:36
    关注

    One thing to explore is Quorum. This can be useful for long queries as you can require a certain number of keywords. While ANY will only require one word to match, quorum can require say 4 out of 7.

    This will rule out a number of really bad matches right off.

    And because quorum is just a syntax as part of extended match mode - you can try all the different ranking modes. Using SPH_RANK_MATCHANY is still available to try - as it should be reasonably good with 'partial' matches. But you can also try the other modes.

    If you are using morphology, you can also enable index_exact_words and give them a boost in the rankings.

    So would do something like ...

    //this works as long as the user is not using special syntax, but if using -="() etc, need to be more clever
    $bits = preg_split('/\s+/',trim($searchterm));
    $quorum = ceil(count($bits)*0.66);
    $searchterm2 = '='.implode(' =',$bits);
    
    $searchterm = '"'.$searchterm.'"/'.$quorum.' | "'.$searchterm2.'"/'.$quorum;
    

    Also, I have doubts about your setLimits. max_matches of 20 seems very low. And the cutoff looks unnecessary; it might even be causing your issues. It will find 500 reasonable documents, and then stop searching - even if there are better matches later in the dataset.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 R语言卸载之后无法重装,显示电脑存在下载某些较大二进制文件行为,怎么办
  • ¥15 java 的protected权限 ,问题在注释里