duanbi6522 2017-09-15 16:34
浏览 32
已采纳

带有聚合和分组的mgo

I am trying to perform a query using golang mgo to effectively get distinct values from a join, I understand that this might not be the best paradigm to work with in Mongo.

Something like this:

pipe := []bson.M{

    {
        "$group": bson.M{
            "_id":  bson.M{"user": "$user"},

        },
    },

    {
        "$match": bson.M{
            "_id":  bson.M{"$exists": 1},
            "user": bson.M{"$exists": 1},
            "date_updated": bson.M{
                "$gt": durationDays,
            },
        },

    },

    {
        "$lookup": bson.M{
            "from":         "users",
            "localField":   "user",
            "foreignField": "_id",
            "as":           "user_details",
        },
    },
    {
        "$lookup": bson.M{
            "from":         "organizations",
            "localField":   "organization",
            "foreignField": "_id",
            "as":           "organization_details",
        },
    },

}

err := d.Pipe(pipe).All(&result)

If I comment out the $group section, the query returns the join as expected.

If I run as is, I get NULL

If I move the $group to the bottom of the pipe I get an array response with Null values

Is it possible to do do an aggregation with a $group (with the goal of simulating DISTINCT) ?

  • 写回答

1条回答 默认 最新

  • duanhao8540 2017-09-19 00:57
    关注

    The reason you're getting NULL is because your $match filter is filtering out all of documents after the $group phase.

    After your first stage of $group the documents are only as below example:

      {"_id": { "user": "foo"}},
      {"_id": { "user": "bar"}},
      {"_id": { "user": "baz"}}
    

    They no longer contains the other fields i.e. user, date_updated and organization. If you would like to keep their values, you can utilise Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression Variables

    As an example using mongo shell, let's use $first operator which basically pick the first occurrence. This may make sense for organization but not for date_updated. Please choose a more appropriate accumulator operator.

    {"$group": { 
              "_id":"$user", 
              "date_updated": {"$first":"$date_updated"}, 
              "organization": {"$first":"$organization"}
             }
    }
    

    Note that the above also replaces {"_id":{"user":"$user"}} with simpler {"_id":"$user"}.

    Next we'll add $project stage to rename our result of _id field from the group operation back to user. Also carry along the other fields without modifications.

    {"$project": {
                  "user": "$_id", 
                  "date_updated": 1, 
                  "organization": 1
                 }
     }
    

    Your $match stage can be simplified, by just listing the date_updated filter. First we can remove _id as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents with user value you should placed $match before the $group. See Aggregation Pipeline Optimization for more.

    So, all of those combined will look something as below:

    [
     {"$group":{ 
                 "_id": "$user", 
                 "date_updated": { "$first": "$date_updated"}, 
                 "organization": { $first: "$organization"} 
               }
     },
     {"$project":{ 
                   "user": "$_id", 
                   "date_updated": 1, 
                   "organization": 1
                 }
     }, 
     {"$match":{
              "date_updated": {"$gt": durationDays } }
     }, 
     {"$lookup":{
                 "from": "users", 
                 "localField": "user", 
                 "foreignField": "_id", 
                 "as": "user_details"
                }
     }, 
     {"$lookup":{
                "from": "organizations", 
                "localField": "organization", 
                "foreignField": "_id", 
                "as": "organization_details"
                }
     }
    ]
    

    (I know you're aware of it) Lastly, based on the database schema above with users and organizations collections, depending on your application use case you may re-consider embedding some values. You may find 6 Rules of Thumb for MongoDB Schema Design useful.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 需要跳转番茄畅听app的adb命令
  • ¥50 寻找一位有逆向游戏盾sdk 应用程序经验的技术
  • ¥15 请问有用MZmine处理 “Waters SYNAPT G2-Si QTOF质谱仪在MSE模式下采集的非靶向数据” 的分析教程吗
  • ¥50 opencv4nodejs 如何安装
  • ¥15 adb push异常 adb: error: 1409-byte write failed: Invalid argument
  • ¥15 nginx反向代理获取ip,java获取真实ip
  • ¥15 eda:门禁系统设计
  • ¥50 如何使用js去调用vscode-js-debugger的方法去调试网页
  • ¥15 376.1电表主站通信协议下发指令全被否认问题
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证