douyi4991 2013-12-29 19:07
浏览 136
已采纳

MongoDB:复合索引决策

I recently had to optimize certain sets of queries on our MongoDB, and run into this particular problem:

say I have a query that match on A and B, then do a range select on C, and output by sorting on D, so in shell they look like:

db.collection.find({ A: 'something', B: 'something-else', C: { $gt: 100 } })
             .sort({ D: -1 }).limit(10)

I read a post last year that talked about creating index for such scenario, their basic rules:

  1. Exact value match field go first
  2. Sorting field comes second
  3. Range search ($in, $gt etc.) field comes last

Their tree explanation looks reasonable so I went ahead and created an index as such:

db.collection.ensureIndex({ A:1, B:1, D:-1, C:-1 })

Now the problem comes: mongodb decides BasicCursor is better than this index. If I hint the full index it works (and much faster), but doing that would require quite a few changes on our codebase, so we are trying to avoid that if at all possible.


My questions are:

  1. Why does mongodb query optimizer decides { A:1, E:-1 }, { D:-1 } or even BasicCursor are better than { A:1, B:1, D:-1, C:-1 }, when my query includes all 4 fields.

  2. Is { A:1, D:-1 } redundant, mongo docs does say using partial index is less efficient?

Furthermore, we also have queries like following:

db.collection.find({ A: { $in : ['str1','str2'] }, B: 'something', C: { $gt: 100 } })
             .sort({ D: -1 }).limit(10)

To efficiently query it, do we need an extra index like following? Frankly I am not sure how will MongoDB query optimizer treat them.

db.collection.ensureIndex({ B:1, D:-1, C:-1, A:1 })

These are the explain for my query with and without hint.

Turns out it was defaulting to { A:1, E:-1 } not { A:1, D:-1 }, which seem even stranger as we did't query on field E.

I dropped the index on { A:1, E:-1 }, now explain tells me it defaults to { D:-1 }, so I dropped it as well, now MongoDB begin using BasicCursor... It doesn't seem to like neither my full index nor the A:1, D:-1 index (despite hint result in much better performance).

This feels weird.

  • 写回答

2条回答 默认 最新

  • douxu3732 2013-12-30 20:35
    关注

    The only reason something "unusual" like this would happen is if your data distribution happens to be such that BasicCursor actually completes the query (i.e. finds all the matching documents) faster than an indexed query. Same thing for a "partial" index.

    A specific case where this would happen, using your data structure as an example is if a has relatively few distinct values at the beginning of a collection, and b has extremely low cardinality (i.e. very few distinct values, like one or a handful) then scanning the collection in order or using a "less efficient" index will show equal or better performance than using theoretically "ideal" index.

    Here's an example where the first 1000 documents have a=1 and b=2 - later documents are very differently distributed.

    > db.compound4.find({a:1, b:2, d:{$lt:100}}).sort({c:-1}).limit(10).explain(true)
    {
        "cursor" : "BtreeCursor a_1",
        "isMultiKey" : false,
        "n" : 10,
        "nscannedObjects" : 18,
        "nscanned" : 18,
        "nscannedObjectsAllPlans" : 46,
        "nscannedAllPlans" : 56,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
            "a" : [
                [
                    1,
                    1
                ]
            ]
        },
        "allPlans" : [
            {
                "cursor" : "BtreeCursor a_1",
                "n" : 18,
                "nscannedObjects" : 18,
                "nscanned" : 18,
                "indexBounds" : {
                    "a" : [
                        [
                            1,
                            1
                        ]
                    ]
                }
            },
            {
                "cursor" : "BtreeCursor a_1_b_1_c_1_d_1 reverse",
                "n" : 10,
                "nscannedObjects" : 10,
                "nscanned" : 20,
                "indexBounds" : {
                    "a" : [
                        [
                            1,
                            1
                        ]
                    ],
                    "b" : [
                        [
                            2,
                            2
                        ]
                    ],
                    "c" : [
                        [
                            {
                                "$maxElement" : 1
                            },
                            {
                                "$minElement" : 1
                            }
                        ]
                    ],
                    "d" : [
                        [
                            100,
                            -1.7976931348623157e+308
                        ]
                    ]
                }
            },
            {
                "cursor" : "BasicCursor",
                "n" : 18,
                "nscannedObjects" : 18,
                "nscanned" : 18,
                "indexBounds" : {
    
                }
            }
        ]
    }
    

    Since the compound index is large it takes longer to traverse than the smaller partial index and because of selectivity of "b" is not very good (i.e. very bad) it makes that query plan fall behind.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 java 的protected权限 ,问题在注释里
  • ¥15 这个是哪里有问题啊?