使用标题确定SphinxQL中的可能类别

I have a database with over 60 million records indexed by SphinxQL 2.1.1. Each record has a title and a catid (among other things). When a new record is inserted into the database, I am trying to get sphinx to guess the catid based on the text in the title.

I have managed to get it working for single words like so:

SELECT @groupby, catid, count(*) c FROM sphinx WHERE MATCH('*LANDLORDS*') group by catid order by c desc

However the actual title is likely to be something like this:

Looking for Landlords - Long term lease - No fees!!!

Is there any way to just dump the whole title string into sphinx and have it break down each of the words and perform some sort of fuzzy match, returning the most likely category?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
donglu4633 2014-06-23 14:38
关注
Well as such sphinx isnt 'magical', and it doesn't have a 'fuzzy match' function.

But can approximate one :) Two main steps...

Changing from requiring all 'words', to just requiring some,

changing ranking, to try to make the best 'intersection' between the query and the title, get a high weight, and therefore 'bubble' to the top.

Can then just take the top result, and take it be a 'best guess'.

(there is actully a third, words lie 'for' and 'the' are likly to cause lots of false positives, so may want to exclude them, either using stopwords on the index, or just strip then from the query)

A prototype of such a query might be something like

SELECT catid FROM sphinx WHERE MATCH('"Looking Landlords Long term lease No fees"/1') OPTION ranker=wordcount LIMIT 1;

Thats using quorum to affect matching, and choosing a different ranker.

Using this version with grouping, proabbly wont work, as will include lots of low quality matches. Although could perhap try using avg, or sum to get a composite weight?

SELECT SUM(WEIGHT()) as w, catid FROM sphinx WHERE MATCH('"Looking Landlords Long term lease No fees"/1') GROUP BY catid ORDER BY w DESC OPTION ranker=wordcount LIMIT 1

There are lots of ways to tweak this...

You can try other rankers, eg matchany. Or even some custom ranking expressions.

Or change the quorum, eg rather rank requiring 1 word, could result at least a few.

Or if can extract phrases, eg

'"Looking Landlords" | "Long term lease" | "No fees"'

might work?

ALso could rather than just taking the top result, take the top 5-10 results, and show them all to the user, compenstates for the fact the results are very approximate.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用标题确定SphinxQL中的可能类别 mysql php sphinx
2014-06-22 09:04

回答 1 已采纳 Well as such sphinx isnt 'magical', and it doesn't have a 'fuzzy match' function. But can approx
Sphinxql与PDO php mysql php
2015-12-27 11:25

回答 1 已采纳 Ok, i got this, if anyone in the future has this same problem. The solution is to add the special
Sphinx查询语言与API - 效率和生产中的使用 php python sphinx
2015-03-11 18:42

回答 1 已采纳 The main advantage is, can do more via SphinxQL than via the now outdated API. And its usable anyw
django-sphinxql:Django中的Sphinx搜索
2021-05-10 23:57

Django-SphinxQL定义了一个在Django中使用Sphinx的ORM。因此，它允许您在Django网站中使用Sphinx进行全文搜索。具体来说，此API允许您执行以下操作：使用Python配置Sphinx。在Sphinx中索引Django模型。 ...
使用Sphinx和PHP通过部分字符串搜索结果 php sphinx
2015-05-19 11:36

回答 2 已采纳 To get 'substring' matches, you need to specifically enable them. http://sphinxsearch.com/docs/c
PHP - 如何在Sphinx w / RT索引中配置通配符和模糊搜索 mysql php sphinx
2011-05-13 15:19

回答 2 已采纳 I asked this question on sphinxsearch forum and received this reply: Hello. barryhunter just rep
JaverSphinxBundle：Symfony捆绑包，使用SphinxQL将Sphinx搜索引擎与Symfony集成在一起
2021-02-06 01:39

特征： SphinxQL查询生成器与整合与集成Symfony Profiler工具栏部分，其中包含已执行查询的数量，以及Profiler页面，其中包含有关已执行查询的详细信息使用场景测试搜索的能力要求PHP 7.1以上pdo_mysql php扩展安装...
sphinxql如何得到结果数及show meta的详细说明
2020-09-10 22:51

想用sphinxql只得到结果数。跟mysql里的count(*)一样
sphinxql php,Sphinxql – 在PHP中使用Sphinxql时如何使用具有顺序的查询,而不使用sphinxapi.php...
2021-03-12 04:52

沙秀芝的博客我一直在使用mysql FTS,但最近切换到sphinx进行测试.在centos 7上安装狮身人面像Linux production 3.10.0-123.8.1.el7.x86_64 #1 GNU/Linux的sphinx.confsource content_src1{type = mysqlsql_host = localhostsql_...
Sphinx搜索库提供SphinxQL索引和搜索功能-PHP开发
2021-05-27 08:40

Sphinx搜索Sphinx搜索库提供SphinxQL索引和搜索功能。简介安装配置（简单）用法...基于Zend \ Db \ Sql的SphinxQL查询生成器一个简单的Search类一个Indexer类与RT索引通过Zend \ Db \进行SphinxQL连接的工厂配合使用
Sphinx/MySQL 协议支持与SphinxQL应用实例
2020-09-10 22:51

Sphinx/MySQL 协议支持与SphinxQL应用例子，供大家学习参考
SphinxQL:用于 PHP 5.3+ 的 SphinxQL 查询生成器
2021-06-13 13:16

SphinxQL 的查询生成器
SphinxSearch提供SphinxQL索引和搜索功能的库
2019-08-07 13:31

Sphinx Search - 提供SphinxQL索引和搜索功能的库
sphinxql php,Sphinx最佳实践
2021-04-13 12:07

licht adler的博客 Sphinx一般有多种种连接方式，一种是SphinxAPI,再就是SphinxQL,当然还有SphinxSESphinxAPI：一系列searchd 的客户端API 库，用于流行的Web脚本开发语言(PHP, Python, Perl, Ruby, Java)，使用简单，不多言...
SphinxQL-Query-Builder:SphinxQL查询生成器生成SQL语言方言SphinxQL，用于查询Sphinx搜索引擎。（作曲家套餐）
2021-05-14 10:51

这是一个SphinxQL查询生成器，用于与SphinxQL一起使用，SphinxQL是一种与Sphinx搜索引擎一起使用SQL方言，它是Manticore的分支。它映射了列出的大多数功能，并且通常比可用的Sphinx API 。除了PHP 7.1或更高版本...
sphinxql php,Sphinx/MySQL 协议支持与SphinxQL应用实例
2021-04-13 12:08

润0713的博客 Sphinx/MySQL 协议支持与SphinxQL应用例子，供大家学习参考Sphinx的searchd守护程序从版本0.9.9-rc2开始支持MySQL二进制网络协议，并且能够通过标准的MySQL API访问。例如，“mysql”命令行程序可以很好地工作。以下...
Laravel开发-sphinxql
2019-08-28 17:09

Laravel开发-sphinxql 将SphinxQL与Laravel 4集成。简化Laravel Sphinx（RT）搜索
SphinxQL语句查询----或者 / 并且
2020-05-16 15:57

itoof.com的博客 SphinxQL语句查询----或者 / 并且1、单一字段2、多个字段全是或者的关系3、多个字段全是并且关系4、多个字段或于且关系 1、单一字段 SELECT * FROM table WHERE MATCH ( '@case_brief "商标"|"不正"' ) ...
sphinx with mysql_Sphinx/MySQL 协议支持与SphinxQL应用例子
2021-02-11 10:16

weixin_39927799的博客 Sphinx/MySQL 协议支持与SphinxQL应用例子，供大家学习参考。Sphinx的searchd守护程序从版本0.9.9-rc2开始支持MySQL二进制网络协议，并且能够通过标准的MySQL API访问。例如，“mysql”命令行程序可以很好地工作。...
没有解决我的问题, 去提问

悬赏问题

¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题：[h264 @ 000000004faf7500]no frame？
¥15 乘性高斯噪声在深度学习网络中的应用
¥15 运筹学排序问题中的在线排序
¥15 关于docker部署flink集成hadoop的yarn，请教个问题 flink启动yarn-session.sh连不上hadoop，这个整了好几天一直不行，求帮忙看一下怎么解决
¥30 求一段fortran代码用IVF编译运行的结果
¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败
¥20 有人能用聚类分析帮我分析一下文本内容嘛

使用标题确定SphinxQL中的可能类别

1条回答 默认 最新

悬赏问题

1条回答默认最新