douyi4991 2013-12-29 19:07

已采纳

MongoDB：复合索引决策

I recently had to optimize certain sets of queries on our MongoDB, and run into this particular problem:

say I have a query that match on A and B, then do a range select on C, and output by sorting on D, so in shell they look like:

db.collection.find({ A: 'something', B: 'something-else', C: { $gt: 100 } })
             .sort({ D: -1 }).limit(10)

I read a post last year that talked about creating index for such scenario, their basic rules:

Exact value match field go first
Sorting field comes second
Range search ($in, $gt etc.) field comes last

Their tree explanation looks reasonable so I went ahead and created an index as such:

db.collection.ensureIndex({ A:1, B:1, D:-1, C:-1 })

Now the problem comes: mongodb decides BasicCursor is better than this index. If I hint the full index it works (and much faster), but doing that would require quite a few changes on our codebase, so we are trying to avoid that if at all possible.

My questions are:

Why does mongodb query optimizer decides { A:1, E:-1 }, { D:-1 } or even BasicCursor are better than { A:1, B:1, D:-1, C:-1 }, when my query includes all 4 fields.
Is { A:1, D:-1 } redundant, mongo docs does say using partial index is less efficient?

Furthermore, we also have queries like following:

db.collection.find({ A: { $in : ['str1','str2'] }, B: 'something', C: { $gt: 100 } })
             .sort({ D: -1 }).limit(10)

To efficiently query it, do we need an extra index like following? Frankly I am not sure how will MongoDB query optimizer treat them.

db.collection.ensureIndex({ B:1, D:-1, C:-1, A:1 })

These are the explain for my query with and without hint.

with hint (full index): http://pastebin.com/xtpJ3dsf
with hint (A,D index): http://pastebin.com/v66QmtsP
without hint: http://pastebin.com/QAtM0WN0
without hint (dropped other index): http://pastebin.com/6ZDweiNX

Turns out it was defaulting to { A:1, E:-1 } not { A:1, D:-1 }, which seem even stranger as we did't query on field E.

I dropped the index on { A:1, E:-1 }, now explain tells me it defaults to { D:-1 }, so I dropped it as well, now MongoDB begin using BasicCursor... It doesn't seem to like neither my full index nor the A:1, D:-1 index (despite hint result in much better performance).

This feels weird.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

douxu3732 2013-12-30 20:35

关注

The only reason something "unusual" like this would happen is if your data distribution happens to be such that BasicCursor actually completes the query (i.e. finds all the matching documents) faster than an indexed query. Same thing for a "partial" index.

A specific case where this would happen, using your data structure as an example is if a has relatively few distinct values at the beginning of a collection, and b has extremely low cardinality (i.e. very few distinct values, like one or a handful) then scanning the collection in order or using a "less efficient" index will show equal or better performance than using theoretically "ideal" index.

Here's an example where the first 1000 documents have a=1 and b=2 - later documents are very differently distributed.

> db.compound4.find({a:1, b:2, d:{$lt:100}}).sort({c:-1}).limit(10).explain(true)
{
    "cursor" : "BtreeCursor a_1",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 18,
    "nscanned" : 18,
    "nscannedObjectsAllPlans" : 46,
    "nscannedAllPlans" : 56,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "a" : [
            [
                1,
                1
            ]
        ]
    },
    "allPlans" : [
        {
            "cursor" : "BtreeCursor a_1",
            "n" : 18,
            "nscannedObjects" : 18,
            "nscanned" : 18,
            "indexBounds" : {
                "a" : [
                    [
                        1,
                        1
                    ]
                ]
            }
        },
        {
            "cursor" : "BtreeCursor a_1_b_1_c_1_d_1 reverse",
            "n" : 10,
            "nscannedObjects" : 10,
            "nscanned" : 20,
            "indexBounds" : {
                "a" : [
                    [
                        1,
                        1
                    ]
                ],
                "b" : [
                    [
                        2,
                        2
                    ]
                ],
                "c" : [
                    [
                        {
                            "$maxElement" : 1
                        },
                        {
                            "$minElement" : 1
                        }
                    ]
                ],
                "d" : [
                    [
                        100,
                        -1.7976931348623157e+308
                    ]
                ]
            }
        },
        {
            "cursor" : "BasicCursor",
            "n" : 18,
            "nscannedObjects" : 18,
            "nscanned" : 18,
            "indexBounds" : {

            }
        }
    ]
}

Since the compound index is large it takes longer to traverse than the smaller partial index and because of selectivity of "b" is not very good (i.e. very bad) it makes that query plan fall behind.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

Go + MongoDB：多态查询 mongodb
2019-02-25 11:55

回答 2 已采纳 First, you should always check returned errors, always. bson.Marshal() and bson.Unmarshal() return
Golang和MongoDB：带过滤器的DeleteMany mongodb
2019-06-19 05:43

回答 2 已采纳 Contact.ID is of type xid.ID, which is a byte array: type ID [rawLen]byte So the insert code yo
MongoDB :: setWriteConcern默认值？ mongodb php
2014-09-22 22:54

回答 1 已采纳 If you use setWriteConcern to set the default, it will affect only the subsequent write operations
大数据新视界 --大数据大厂之MongoDB与大数据：灵活文档数据库的应用场景
2024-09-16 22:55

青云交的博客本文深入探讨了 MongoDB 在大数据领域的应用。介绍了 MongoDB 作为文档数据库的特点，包括灵活文档模型、高可扩展性、强大查询功能等优势。阐述了其在数据处理和开发效率方面的表现，以及在互联网、电商、物联网、...
Mongodb：连接失败 laravel mongodb php ubuntu
2016-04-15 10:20

回答 1 已采纳 After a short break, I can find out the solution. The origin of that issue is the unexpected shutd
Go和MongoDB：通用DAO实施问题 mongodb
2017-04-03 18:31

回答 1 已采纳 You should be able to pass a result interface{} to your FindAll function and just pass it along to
MongoDB：我是否应该一直保持全球会议？ mongodb
2017-11-11 17:53

回答 1 已采纳 Per the main package docs: New sessions are typically created by calling session.Copy on the
大数据之路：阿里巴巴大数据实践
2020-08-16 10:49

涛声依旧（竞涛）的博客阿里巴巴大数据系统体系架构主要分为数据采集、数据计算、数据服务和数据应用四大层次。第2章：日志采集阿里巴巴的日志采集体系方案包括两大体系：Aplus.JS是Web端（基于浏览器）日志采集技术方案：UserTrack是...
大数据上的MongoDB聚合超时异常 mongodb php
2016-03-14 07:23

回答 1 已采纳 As I am using Doctrine MongoDB ODM module in my application I fixed my issue in the following way.
yum mongodb 失败未找到匹配的参数: mongodb-org centos linux mongodb
2022-09-09 17:31

回答 1 已采纳推荐使用源码包安装，yum安装不利于维护可以参考文章：https://blog.csdn.net/chj_1224365967/article/details/106857626
PHP + MongoDB：查找多个查询 mongodb php
2013-04-10 16:33

回答 1 已采纳 The MongoDB PHP driver is not capable of doing this today, and infact the MongoDB database itself
MongoDB 与大数据
2018-01-27 18:41

weixin_34279184的博客 Flink做审计日志分析、以及通过创建索引的最佳实践的例子让大家了解阿里云是如何利用大数据对MongoDB里面的数据和信息做分析的。本次直播视频精彩回顾，戳这里！直播涉及到的PPT，戳这里！以下内容根据演讲嘉宾...
深入详解MongoDB索引的数据组织结构
2024-04-01 22:53

码到三十五的博客 MongoDB的索引结构是实现高性能查询的关键所在。通过深入了解B树与B+树的工作原理、不同类型的索引及其用途，以及优化索引使用的策略，可以更好地利用MongoDB的索引功能来提升数据库的性能。
MongoDB实战
2024-01-27 23:35

只年的博客 B+Tree就是一种常用的数据库索引数据结构，MongoDB 采用B+Tree 做索引，索引创建在colletions上。MongoDB不使用索引的查询，先扫描所有的文档，再匹配符合条件的文档。使用索引的查询，通过索引找到文档，使用索引...
MongoDB TPCC事务性能基准测试.pptx
2021-10-14 00:40

索引是提升查询效率的关键，MongoDB 支持多种类型的索引，如单字段、复合索引和文本索引。优化查询计划和创建适当的索引对于最大化性能至关重要。 7. **存储引擎**： MongoDB 使用不同的存储引擎，如MMAPv1、...
没有解决我的问题, 去提问

悬赏问题

¥15 乌班图ip地址配置及远程SSH
¥15 怎么让点阵屏显示静态爱心，用keiluVision5写出让点阵屏显示静态爱心的代码，越快越好
¥15 PSPICE制作一个加法器
¥15 javaweb项目无法正常跳转
¥15 VMBox虚拟机无法访问
¥15 skd显示找不到头文件
¥15 机器视觉中图片中长度与真实长度的关系
¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
¥15 java 的protected权限，问题在注释里
¥15 这个是哪里有问题啊？

码龄粉丝数原力等级 --

MongoDB：复合索引决策

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

MongoDB：复合索引决策

2条回答 默认 最新

悬赏问题

2条回答默认最新