有没有更好的方法从Sphinx / MySQL一次从两个表中获取数据？

Before asking this question it is important to understand what it is I am actually doing.

The best comparison to the feature I am implementing would be Facebook's search feature. When you begin typing a drop down list appears with various search results. At the top you will find your friends whose names match your search, then other people who match, then pages, events etc....

My situation is similar however I only want to search for two things. users and documents (named ripples in the code below).

I have this working fine. Please bear with me while I talk through the logic of this feature in my case:

User focuses on search input.
Ajax request retrieves the logged in users friends/followers/following and caches them client side (This only occurs the first time a user focusses on the search input)
As the user types, a highly optimized function performs a regex against the array of usernames and builds an autocomplete list complete with avatars etc...
At the same time and for every keypress an ajax request is fired to the script below which does the following:
- Performs two separate Sphinx searches on two separate indexes. One to collect userid's and the other to collect document id's (rippleid's)
- The results of the users query are looped through checking against an array of userid's that were sent in the ajax request to avoid duplicating users that were already displayed during the initial high speed friends/followers check.
- Next we query the actual database to get the userdata for the remaining userid's
- The same process is then repeated but this time for the documents (ripples)

And finally any returned results are appended to the auto complete list.

This is an example of the PHP function that performs the sphinx lookups and gets the data from the database.

         public function search()
                {
                                $this->disableLayout();
                                $request = new Request();
                                $params = $request->getParams(GET);

                        //Perform sphinx textsearch
                            include('/usr/local/lib/php/sphinxapi.php');
                            $sphinx = new \SphinxClient();
                            $sphinx->setMatchMode(SPH_MATCH_ANY);
                            $sphinx->SetLimits(0, 4);
                            $mysqlconn = mysql_connect("127.0.0.1:9306") or die ("Couldn't connect to MySQL.");
                            $users = $sphinx->Query($params['data']['q'], "users");
                            $ripples = $sphinx->Query($params['data']['q'], "ripples");


        /*
         *USERS
         */

            //Loop through users and only collect ID's that are not already present    
            if (!empty($users["matches"])) { 
                $ids = "";
                foreach($users['matches'] as $id => $data) {
                    if($ids > ""){
                        $ids .= ",";
                    }
                    if(!isset($params['data']['e'][$id])){
                        $ids .= $id;
                    }
                }


          //If there any any remaining ID's collect the data from the database and return as JSON
                        if(!empty($ids)){
                                $userdataquery = "select users.userid, users.firstname, users.lastname
                                                    from tellycards_user_data users   
                                                    where userid IN($ids)
                                                ";
                                $query = new Query($userdataquery);
                                $usersoutput = $query->fetchAll();                              
                        }
        }

        /*
         *RIPPLES
         */

        //Loop through ripples and collect ID's 
        if (!empty($ripples["matches"])) { 
            $rippleids = "";
            foreach($ripples['matches'] as $id => $data) {
                if($rippleids > ""){
                    $rippleids .= ",";
                }                       
                    $rippleids .= $id;
            }

        //If there any any remaining ID's collect the data from the database and return as JSON
                        if(!empty($rippleids)){
                                $rippledataquery = "select ripples.id, ripples.name, ripples.screenshot
                                                    from tellycards_ripples ripples   
                                                    where id IN($rippleids)
                                                ";
                                $query = new Query($rippledataquery);
                                $ripplesoutput = $query->fetchAll();                              
                        }
        }

        header('Content-type: text/json');
        echo json_encode(array(
                               'users'      => (!empty($usersoutput)) ? $usersoutput : null,
                               'ripples'    => (!empty($ripplesoutput)) ? $ripplesoutput : null
                        ));

}

You might ask why we are doing the initial friends lookup and not just using sphinx for everything. Well by implementing the method above. the user gets instant feedback when they are typing due to having the array of friends stored client side, while despite the fantastic speed of sphinx there inevitably will be some lag due to the http request. In practice it works fantastically and incidentally it appears to be the method that facebook uses also.

Also there is a lot of javascript code preventing unnecessary lookups, the returned data gets added to the cache pile etc so that future searches do not require hitting sphinx/db etc...

Now finally onto my actual question....

This current server side function bothers me a lot. Right now there are two searches being performed by Sphinx and two searches being performed by MySQL. How can I possibly collate all this into one sphinx query and one MySQL query? Is there any way at all? (Please bare in mind that documents and users may share the same PK ID's as they are on two completely different tables in MySQL and are spread (currently) across two separate indexes). Or is there any way to combine the two MySQL queries to make them more efficient than having two separate selects?

Or alternatively... Due to the simplicity of the queries am I best keeping them separate as above? (both are indexed primary key queries)

I guess what I am asking for is any recommendations/advice.

Any commentary is very welcome.

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongyi1490 2012-10-12 08:23
关注
You cant really get away with not having two MySQL queries. Well you could, by either jsut combining them into one, with UNION. Or by creating a new combined 'table' (either a view, or a materialized view) - but really dont think its worth the effort. Two queries is perfectly fine - as you say they indexed.

You could use one sphinx index (and hence one search query) - by creating a new combined index. Because you say your keys are not unique, would have to create a new synthetic key.

eg...

sql_query = SELECT userid*2 AS id, 1 AS table_id, firstname AS one, lastname as two FROM tellycards_user_data \ UNION \ SELECT (id*2)+1 as id, 2 AS table_id, name AS one, screenshot AS two FROM tellycards_ripples sql_attr_unit = table_id

This gives you a fake key , and an attribute to identify what table the result came from. You can use this to get the original table it came from. (there are many other ways of doing the same thing)

This allows you to run one query, can get combined results.

... BUT not convinced its a good idea. Because if the results are asymmetric, you may miss results. Say there are 20 matching results from one table, and 10 from another. Say you show the top 10 results, now becayse of the limit, the results from the second table, could well be hidden below the first table (extream example, in reality, hopefully they intermingled). Two seperate queries, allows you to guarantee, to get SOME results from each table.

... so after all that. Stick with what you got. Its fine.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

编辑

预览

报告相同问题？

关注问题

万字详解PHP+Sphinx中文亿级数据全文检索实战（实测亿级数据0.1秒搜索耗时）
2024-03-29 09:04

小松聊PHP进阶的博客这个也好办，直接在csft.conf配置文件内source段和index段复制粘贴，根据上文的两段文章，该创建索引的创建索引，该...不需要引入多个文件，就和MySQL一样，只需要一个/etc/my.cnf就行了，相加配置，接着往下续就行了。
使用 Sphinx 更好地进行 MySQL 搜索
2014-12-30 14:27

jassonpeter的博客 MySQL 是一个出色的综合性数据库，但是，对于需要进行大量搜索的应用程序，考虑采用具体的搜索实用工具可能会更好一些。本文章将 Sphinx（一个著名的全文本搜索包）视为 MySQL 的替代品，用它来进行搜索，提高非全文...
MySQL海量数据项目实战
2021-09-12 13:48

羌俊恩的博客本文主要介绍包含 MySQL 典型数据库架构介绍、MySQL 主流数据库架构对比等理论性知识，然后从“订单、用户”两个项目实战，抛砖引玉，介绍亿级互联网业务数据库项目如何设计。一般流程：二、MySQL架构 2.1、...
PHP+Sphinx+MySQL实现全文检索
2018-05-01 16:27

jartins的博客理论篇 sphinx 可以解决站内搜索的应用场景（用来生成索引数据，为后续的高效搜索做铺垫） sphinx 简介 sphinx是一个基于SQL(mysql sqlserver... )的全文检索引擎，它提供了比数据库更加专业的搜索功能，并且sphinx...
[技术资料]MySQL数据库万字详解：一个开发者的必读指南
2024-11-28 07:15

Doug.的博客 [属性] [索引] [注释],`字段名` 列类型 [属性] [索引] [注释],......`字段名` 列类型 [属性] [索引] [注释],PRIMARY KEY (`字段名`))[表类型][字符集设置]聚合函数又叫组函数，通常是对表中的数据进行统计和计算，...
你知道学校里的MySQL与社会中的MySQL有啥区别吗？（详解二存储引擎）
2021-02-01 09:00

欢少的成长之路的博客简介：以上文章讲述的是【数据库性能调优知识与面试知识（详解一）】接下来我总结一下【数据库性能调优知识与面试知识（详解二）】。觉得我还可以的可以加群一起督促学习探讨技术。QQ群：1076570504 个人学习资料库...
【MySQL】索引
2024-05-26 15:52

加油，旭杏的博客 MySQL的服务器本质是在内存中的，所有数据库的CRUD操作，全部是在内存中——索引也是这样。提高算法效率的因素：1.组织数据的方式；2.算法本身。所以，索引是通过组织数据的方式来进行减少海量数据的检索速度，利用...
高性能MySQL实战第12讲：海量数据MySQL项目实战
2022-09-10 13:47

办公模板库素材蛙的博客欢迎来到第 10 课时“MySQL 亿级数据库项目实战”，这是本系列课程的最后一课时，本课时的主要内容包含 MySQL 典型数据库架构介绍、MySQL 主流数据库架构对比等理论性知识，然后从“订单、用户”两个项目实战，...
详解MySQL索引与底层原理
2022-08-19 00:00

白龙码~的博客从数据页到缓冲池再到索引的B+树，解析聚簇索引与非聚簇索引的优劣以及为什么不采取其它数据结构。最后，认识explain在分析索引有效性时的用处。
MySQL（四）--索引
2024-12-08 05:34

YangZ123123的博客 MySQL（四）--索引
innobackupex备份mysql大数据(全量+增量）操作记录
2020-06-22 05:52

jerry-89的博客在日常的linux运维工作中，大数据量备份与还原，始终是个难点。关于mysql的备份和恢复，比较传统的是用mysqldump工具，今天这里推荐另一个备份工具innobackupex。innobackupex和mysqldump都可以对mysql进行热备份的...
MySQL索引
2024-08-31 08:04

Xpccccc的博客 MySQL索引
除了MySQL数据库，你还要了解的一些数据库
2022-06-08 06:20

Java烟雨的博客数据库的选型，在我架构过程中也是重中之重。什么场景适合合适什么数据库，每种数据库的特点是什么，在架构中起到什么样的作用，承担的重点业务是什么？我们在（https://db-engines.com/en/ranking）看到，参与排名...
mysql分词插件下载，安装，使用
2021-09-16 08:12

zh7314的博客 2021年9月15日10:16:44 ...自MySQL5.7.6版起，MySQL将ngram全文解析器作为内置的服务器插件官方文档 https://dev.mysql.com/doc/refman/8.0/en/fulltext-search-ngram.html 如果你需要额外的其他插件安装方法差不多 ...
数据库MySQL
2024-10-14 13:48

霍金的微笑的博客优化MySQL数据库笔记
MYSQL千万级数据量的优化方法积累
2019-05-24 06:50

SELECT_BIN的博客后面还会持续添加 1.对查询进行优化，应尽量避免全表扫描，首先应考虑在 where 及 ...2.应尽量避免在 where 子句中对字段进行 null 值判断，否则将导致引擎放弃使用索引而进行全表扫描，如：select id from t whe...
mysql之索引原理与慢查询优化
2019-07-31 11:29

yjclsx的博客一、介绍 1.什么是索引？一般的应用系统，读写比例在10:1左右，而且插入操作和一般的更新操作很少...索引在MySQL中也叫做“键”，是存储引擎用于快速找到记录的一种数据结构。索引对于良好的性能非常关键，尤其...
Mysql备份系列（3）--innobackupex备份mysql大数据(全量+增量）操作记录
2016-11-25 10:37

weixin_33854644的博客在日常的linux运维工作中，大数据量备份与还原，始终是个难点。关于mysql的备份和恢复，比较传统的是用mysqldump工具，今天这里推荐另一个备份工具innobackupex。innobackupex和mysqldump都可以对mysql进行热备份的...
没有解决我的问题, 去提问

有没有更好的方法从Sphinx / MySQL一次从两个表中获取数据？

2条回答 默认 最新

2条回答默认最新