返回某种类型的有限数量的记录，但其他记录的数量不限？

I have a query where I need to return 10 of "Type A" records, while returning all other records. How can I accomplish this?

Update: Admittedly, I could do this with two queries, but I wanted to avoid that, if possible, thinking it would be less overhead, and possibly more performant. My query already is an aggregation query that takes both kinds of records into account, I just need to limit the number of the one type of record in the results.

Update: the following is an example query that highlights the problem:

db.books.aggregate([
    {$geoNear: {near: [-118.09771, 33.89244], distanceField: "distance", spherical: true}},
    {$match:    {"type": "Fiction"}},
    {$project:  {
        'title': 1,
        'author': 1,
        'type': 1,
        'typeSortOrder': 
            {$add: [
                {$cond: [{$eq: ['$type', "Fiction"]}, 1, 0]},
                {$cond: [{$eq: ['$type', "Science"]}, 0, 0]},
                {$cond: [{$eq: ['$type', "Horror"]}, 3, 0]}
        ]},
    }},
    {$sort: {'typeSortOrder'}},
    {$limit: 10}
])

db.books.aggregate([
    {$geoNear: {near: [-118.09771, 33.89244], distanceField: "distance", spherical: true}},
    {$match:    {"type": "Horror"}},
    {$project:  {
        'title': 1,
        'author': 1,
        'type': 1,
        'typeSortOrder': 
            {$add: [
                {$cond: [{$eq: ['$type', "Fiction"]}, 1, 0]},
                {$cond: [{$eq: ['$type', "Science"]}, 0, 0]},
                {$cond: [{$eq: ['$type', "Horror"]}, 3, 0]}
        ]},
    }},
    {$sort: {'typeSortOrder'}},
    {$limit: 10}
])

db.books.aggregate([
    {$geoNear: {near: [-118.09771, 33.89244], distanceField: "distance", spherical: true}},
    {$match:    {"type": "Science"}},
    {$project:  {
        'title': 1,
        'author': 1,
        'type': 1,
        'typeSortOrder': 
            {$add: [
                {$cond: [{$eq: ['$type', "Fiction"]}, 1, 0]},
                {$cond: [{$eq: ['$type', "Science"]}, 0, 0]},
                {$cond: [{$eq: ['$type', "Horror"]}, 3, 0]}
        ]},
    }},
    {$sort: {'typeSortOrder'}},
    {$limit: 10}
])

I would like to have all these records returned in one query, but limit the type to at most 10 of any category. I realize that the typeSortOrder doesn't need to be conditional when the queries are broken out like this, I had it there for when the queries were one query, originally (which is where I would like to get back to).

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douchuanhan8384 2014-08-07 07:57
关注
Problem

The results here are not impossible but are also possibly impractical. The general notes have been made that you cannot "slice" an array or otherwise "limit" the amount of results pushed onto one. And the method for doing this per "type" is essentially to use arrays.

The "impractical" part is usually about the number of results, where too large a result set is going to blow up the BSON document limit when "grouping". But, I'm going to consider this with some other recommendations on your "geo search" along with the ultimate goal to return 10 results of each "type" at most.

Principle

To first consider and understand the problem, let's look at a simplified "set" of data and the pipeline code necessary to return the "top 2 results" from each type:

{ "title": "Title 1", "author": "Author 1", "type": "Fiction", "distance": 1 }, { "title": "Title 2", "author": "Author 2", "type": "Fiction", "distance": 2 }, { "title": "Title 3", "author": "Author 3", "type": "Fiction", "distance": 3 }, { "title": "Title 4", "author": "Author 4", "type": "Science", "distance": 1 }, { "title": "Title 5", "author": "Author 5", "type": "Science", "distance": 2 }, { "title": "Title 6", "author": "Author 6", "type": "Science", "distance": 3 }, { "title": "Title 7", "author": "Author 7", "type": "Horror", "distance": 1 }

That's a simplified view of the data and somewhat representative of the state of documents after an initial query. Now comes the trick of how to use the aggregation pipeline to get the "nearest" two results for each "type":

db.books.aggregate([ { "$sort": { "type": 1, "distance": 1 } }, { "$group": { "_id": "$type", "1": { "$first": { "_id": "$_id", "title": "$title", "author": "$author", "distance": "$distance" } }, "books": { "$push": { "_id": "$_id", "title": "$title", "author": "$author", "distance": "$distance" } } }}, { "$project": { "1": 1, "books": { "$cond": [ { "$eq": [ { "$size": "$books" }, 1 ] }, { "$literal": [false] }, "$books" ] } }}, { "$unwind": "$books" }, { "$project": { "1": 1, "books": 1, "seen": { "$eq": [ "$1", "$books" ] } }}, { "$sort": { "_id": 1, "seen": 1 } }, { "$group": { "_id": "$_id", "1": { "$first": "$1" }, "2": { "$first": "$books" }, "books": { "$push": { "$cond": [ { "$not": "$seen" }, "$books", false ] } } }}, { "$project": { "1": 1, "2": 2, "pos": { "$literal": [1,2] } }}, { "$unwind": "$pos" }, { "$group": { "_id": "$_id", "books": { "$push": { "$cond": [ { "$eq": [ "$pos", 1 ] }, "$1", { "$cond": [ { "$eq": [ "$pos", 2 ] }, "$2", false ]} ] } } }}, { "$unwind": "$books" }, { "$match": { "books": { "$ne": false } } }, { "$project": { "_id": "$books._id", "title": "$books.title", "author": "$books.author", "type": "$_id", "distance": "$books.distance", "sortOrder": { "$add": [ { "$cond": [ { "$eq": [ "$_id", "Fiction" ] }, 1, 0 ] }, { "$cond": [ { "$eq": [ "$_id", "Science" ] }, 0, 0 ] }, { "$cond": [ { "$eq": [ "$_id", "Horror" ] }, 3, 0 ] } ] } }}, { "$sort": { "sortOrder": 1 } } ])

Of course that is just two results, but it outlines the process for getting n results, which naturally is done in generated pipeline code. Before moving onto the code the process deserves a walk through.

After any query, the first thing to do here is $sort the results, and this you want to basically do by both the "grouping key" which is the "type" and by the "distance" so that the "nearest" items are on top.

The reason for this is shown in the $group stages that will repeat. What is done is essentially "popping the $first result off of each grouping stack. So other documents are not lost, they are placed in an array using $push.

Just to be safe, the next stage is really only required after the "first step", but could optionally be added for similar filtering in the repetition. The main check here is that the resulting "array" is larger than just one item. Where it is not, the contents are replaced with a single value of false. The reason for which is about to become evident.

After this "first step" the real repetition cycle beings, where that array is then "de-normalized" with $unwind and then a $project made in order to "match" the document that has been last "seen".

As only one of the documents will match this condition the results are again "sorted" in order to float the "unseen" documents to the top, while of course maintaining the grouping order. The next thing is similar to the first $group step, but where any kept positions are maintained and the "first unseen" document is "popped off the stack" again.

The document that was "seen" is then pushed back to the array not as itself but as a value of false. This is not going to match the kept value and this is generally the way to handle this without being "destructive" to the array contents where you don't want the operations to fail should there not be enough matches to cover the n results required.

Cleaning up when complete, the next "projection" adds an array to the final documents now grouped by "type" representing each position in the n results required. When this array is unwound, the documents can again be grouped back together, but now all in a single array that possibly contains several false values but is n elements long.

Finally unwind the array again, use $match to filter out the false values, and project to the required document form.

Practicality

The problem as stated earlier is with the number of results being filtered as there is a real limit on the number of results that can be pushed into an array. That is mostly the BSON limit, but you also don't really want 1000's of items even if that is still under the limit.

The trick here is keeping the initial "match" small enough that the "slicing operations" becomes practical. There are some things with the $geoNear pipeline process that can make this a possibility.

The obvious is limit. By default this is 100 but you clearly want to have something in the range of:

(the number of categories you can possibly match) X ( required matches )

But if this is essentially a number not in the 1000's then there is already some help here.

The others are maxDistance and minDistance, where essentially you put upper and lower bounds on how "far out" to search. The max bound is the general limiter while the min bound is useful when "paging", which is the next helper.

When "upwardly paging", you can use the query argument in order to exclude the _id values of documents "already seen" using the $nin query. In much the same way, the minDistance can be populated with the "last seen" largest distance, or at least the smallest largest distance by "type". This allows some concept of filtering out things that have already been "seen" and getting another page.

Really a topic in itself, but those are the general things to look for in reducing that initial match in order to make the process practical.

Implementing

The general problem of returning "10 results at most, per type" is clearly going to want some code in order to generate the pipeline stages. No-one wants to type that out, and practically speaking you will probably want to change that number at some point.

So now to the code that can generate the monster pipeline. All code in JavaScript, but easy to translate in principles:

var coords = [-118.09771, 33.89244]; var key = "$type"; var val = { "_id": "$_id", "title": "$title", "author": "$author", "distance": "$distance" }; var maxLen = 10; var stack = []; var pipe = []; var fproj = { "$project": { "pos": { "$literal": [] } } }; pipe.push({ "$geoNear": { "near": coords, "distanceField": "distance", "spherical": true }}); pipe.push({ "$sort": { "type": 1, "distance": 1 }}); for ( var x = 1; x <= maxLen; x++ ) { fproj["$project"][""+x] = 1; fproj["$project"]["pos"]["$literal"].push( x ); var rec = { "$cond": [ { "$eq": [ "$pos", x ] }, "$"+x ] }; if ( stack.length == 0 ) { rec["$cond"].push( false ); } else { lval = stack.pop(); rec["$cond"].push( lval ); } stack.push( rec ); if ( x == 1) { pipe.push({ "$group": { "_id": key, "1": { "$first": val }, "books": { "$push": val } }}); pipe.push({ "$project": { "1": 1, "books": { "$cond": [ { "$eq": [ { "$size": "$books" }, 1 ] }, { "$literal": [false] }, "$books" ] } }}); } else { pipe.push({ "$unwind": "$books" }); var proj = { "$project": { "books": 1 } }; proj["$project"]["seen"] = { "$eq": [ "$"+(x-1), "$books" ] }; var grp = { "$group": { "_id": "$_id", "books": { "$push": { "$cond": [ { "$not": "$seen" }, "$books", false ] } } } }; for ( n=x; n >= 1; n-- ) { if ( n != x ) proj["$project"][""+n] = 1; grp["$group"][""+n] = ( n == x ) ? { "$first": "$books" } : { "$first": "$"+n }; } pipe.push( proj ); pipe.push({ "$sort": { "_id": 1, "seen": 1 } }); pipe.push(grp); } } pipe.push(fproj); pipe.push({ "$unwind": "$pos" }); pipe.push({ "$group": { "_id": "$_id", "msgs": { "$push": stack[0] } } }); pipe.push({ "$unwind": "$books" }); pipe.push({ "$match": { "books": { "$ne": false } }}); pipe.push({ "$project": { "_id": "$books._id", "title": "$books.title", "author": "$books.author", "type": "$_id", "distance": "$books", "sortOrder": { "$add": [ { "$cond": [ { "$eq": [ "$_id", "Fiction" ] }, 1, 0 ] }, { "$cond": [ { "$eq": [ "$_id", "Science" ] }, 0, 0 ] }, { "$cond": [ { "$eq": [ "$_id", "Horror" ] }, 3, 0 ] }, ] } } }); pipe.push({ "$sort": { "sortOrder": 1, "distance": 1 } });

Alternate

Of course the end result here and the general problem with all above is that you really only want the "top 10" of each "type" to return. The aggregation pipeline will do it, but at the cost of keeping more than 10 and then "popping off the stack" until 10 is reached.

An alternate approach is to "brute force" this with mapReduce and "globally scoped" variables. Not as nice since the results all in arrays, but it may be a practical approach:

db.collection.mapReduce( function () { if ( !stash.hasOwnProperty(this.type) ) { stash[this.type] = []; } if ( stash[this.type.length < maxLen ) { stash[this.type].push({ "title": this.title, "author": this.author, "type": this.type, "distance": this.distance }); emit( this.type, 1 ); } }, function(key,values) { return 1; // really just want to keep the keys }, { "query": { "location": { "$nearSphere": [-118.09771, 33.89244] } }, "scope": { "stash": {}, "maxLen": 10 }, "finalize": function(key,value) { return { "msgs": stash[key] }; }, "out": { "inline": 1 } } )

This is a real cheat which just uses the "global scope" to keep a single object whose keys are the grouping keys. The results are pushed onto an array in that global object until the maximum length is reached. Results are already sorted by nearest, so the mapper just gives up doing anything with the current document after the 10 are reached per key.

The reducer wont be called since only 1 document per key is emitted. The finalize then just "pulls" the value from the global and returns it in the result.

Simple, but of course you don't have all the $geoNear options if you really need them, and this form has the hard limit of 100 document as the output from the initial query.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(3条)

报告相同问题？

关注问题

返回某种类型的有限数量的记录，但其他记录的数量不限？ javascript mapreduce mongodb php
2014-07-30 15:26

回答 4 已采纳 Problem The results here are not impossible but are also possibly impractical. The general no
typescript 使用class 提示不存在属性，请问该如何处理？ typescript 前端
2022-05-26 11:06

回答 2 已采纳 typescript中变量要声明所属的数据类型，不确定类型下声明为any class Father { surname: any constructor(surname:string)
计算MySQL表行中某种类型值的数量 database mysql php
2013-04-27 21:10

回答 1 已采纳 do you mean something like this? SELECT COUNT(otherType),otherType FROM tableName group by otherT
记录遇到的web前端开发面试题（八股文）
2022-04-20 18:26

原始猿miser的博客文章目录前言一、javascript相关1.js的八个基本数据类型（高频）2.this的五种指向方式3.Function的call，apply和bind方法的区别？4.Promise，手写promise5.闭包6.原型链，作用域链与原型链的区别？7.继承与继承的五...
混合型数组能不能根据数组下标间隔的规律来单独对某种类型的数据排序？
2016-10-18 15:47

回答 2 已采纳 https://zhidao.baidu.com/question/361326079.html
java怎么判断异常类型？ java
2017-06-22 12:07

回答 3 已采纳可以这样的try {} catch(){} catch(){},catch里面写异常,可以接着写异常,但是要记住,最里面的catch异常是最小的,先执行第一个catch,如果没有才会执行第二个catc
您如何看待这种在mysql中记录更改并具有某种审计跟踪的方法 database mysql php
2017-08-31 09:45

回答 3 已采纳 I've answered a similar question before. You'll see some good alternatives in that question. In y
2023前端面试题汇总
2023-03-09 17:58

下雪不过冬天的博客 2023前端基础面试题汇总
如何枚举某种类型的常量
2017-04-24 09:26

回答 2 已采纳 Basic concept The reflect package does not provide access to exported identifiers, as there is no
如果新产品属于某种类型，我该如何设置标签 php
2019-03-15 12:41

回答 2 已采纳 So took too long to figure this out but the answer is surprisingly simple and was actually in the
如何使用ajax在bootstrap模式中加载某种类型的文件 ajax php
2019-01-22 21:04

回答 1 已采纳 You are returning from the function before trying to set the html and showing the modal. Just set
前端基础知识点-每天一个基本知识点（100+个前端小知识，你是否都知道？）
2022-02-21 00:25

文默的博客不可重复声明，会报错 const的最大区别是不可重新赋值，其他类似let，但有个地方，当const的值为引用类型时，是可以重新赋值的，因为在栈空间报存的是地址，真正的值在堆空间（那么如何让他不可更改呢？）这个问题...
由于某种原因，document.getElementById（）。innerHTML不起作用？ html javascript php
2015-10-06 02:11

回答 2 已采纳 Your error message Cannot set property 'innerHTML' of null" means that: document.getElementById("
前端200道面试题及答案（更新中）
2022-06-11 21:46

哎呀你干嘛啊的博客不能存储其他数据类型，只能存储字符串数据类型如果一定要存储其他数据类型，转成 json 格式存储 2）link和@import的区别都知道，外部引入 CSS 有2种方式，link标签和@import。但是它们有何本质区别？本人项目...
2021年前端面试题及答案
2020-02-11 19:29

Raleway的博客前端面试汇总（2020年）一大纲 1、前言 2、前端工程化 3、前端设计模式 4、前端安全性问题 5、前端跨域问题 6、前端数据加密 7、前端http相关问题 8、*前端基础知识点面试题 9...
没有解决我的问题, 去提问

悬赏问题

¥15 基于卷积神经网络的声纹识别
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 stm32开发clion时遇到的编译问题

返回某种类型的有限数量的记录，但其他记录的数量不限？

4条回答 默认 最新

Problem

Principle

Practicality

Implementing

Alternate

悬赏问题

4条回答默认最新