duanhu2414 2015-10-15 09:40
浏览 172
已采纳

多个指标Elasticsearch上的相同聚合

I have setup snowplow with Elasticsearch.

When I want to get the data out I just do normal queries and use aggregates to get them by day, country etc.

So I want to figure out clickthru rate for these aggregations, I have 2 kind of events: page views and clicks.

Currently I do 2 queries:

Page Views:

{
    "size": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "event": "page_view"
                            }
                        }
                    ],
                    "must_not": {
                        "term": {
                            "br_family": "Robot"
                        }
                    }
                }
            }
        }
    },
    "aggs": {
        "dates": {
            "date_histogram": {
                "field": "collector_tstamp",
                "interval": "day"
            }
        }
    }
}

Clicks:

{
    "size": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "event": "struct"
                            }
                        },
                        {
                            "term": {
                                "se_action": "click"
                            }
                        }
                    ],
                    "must_not": {
                        "term": {
                            "br_family": "Robot"
                        }
                    }
                }
            }
        }
    },
    "aggs": {
        "dates": {
            "date_histogram": {
                "field": "collector_tstamp",
                "interval": "day"
            }
        }
    }
}

I format the response to something easier to use and then merge them in PHP using something like this.

function merge_metrics($pv,$c){
    $r = array();

    if(count($pv) > 0){
        foreach ($pv as $key => $value) {
            $r[$value['name']]['page_views'] += $value['count']; 
        }
    }
    if(count($c) > 0){
        foreach ($c as $key => $value) {
            $r[$value['name']]['clicks'] += $value['count']; 
        }
    }

    $rf = array();

    foreach ($r as $key => $value) {
        $tmp_clicks = isset($value['clicks']) ? $value['clicks'] : 0;
        $tmp_page_views = isset($value['page_views']) ? isset($value['page_views']) : 0;
        $rf[] = array(
                'name' => $key,
                'page_views' => $tmp_page_views,
                'clicks' => $tmp_clicks,
                'ctr' => ctr($tmp_clicks,$tmp_page_views)
            ); 
    }

    return $rf;
}

Both $pv and $c are arrays that contain the aggregates that result from querying Elasticsearch and I do some formatting for ease of use.

My question is:

Is it possible get multiple metrics(in my case page views and clicks, these are specific filters) and perform same aggregations on both ? then returning the aggregations something like :

{
    "data": [
        {
            "day": "2015-10-13",
            "page_views": 61,
            "clicks": 0,
        },
        {
            "day": "2015-10-14",
            "page_views": 135,
            "clicks": 1,
        },
        {
            "day": "2015-10-15",
            "page_views": 39,
            "clicks": 0,
        }
    ]
}

But without me having to manually merge them ?

  • 写回答

1条回答 默认 最新

  • duanjiong2021 2015-10-15 11:12
    关注

    Yes, it is definitely possible if you merge your aggregations into one single query. For instance, I suppose you have one query like this for page views:

    {
        "query": {...}
        "aggregations": {
            "by_day": {
                "date_histogram": {
                    "field": "day",
                    "interval": "day"
                },
                "aggs": {
                    "page_views_per_day": {
                        "sum": {
                            "field": "page_views"
                        }
                    }
                }
            }
        }
    }
    

    And another query like this for clicks:

    {
        "query": {...}
        "aggregations": {
            "by_day": {
                "date_histogram": {
                    "field": "day",
                    "interval": "day"
                },
                "aggs": {
                    "clicks_per_day": {
                        "sum": {
                            "field": "clicks"
                        }
                    }
                }
            }
        }
    }
    

    Provided you have the same constraints in your query, you can definitely merge them together at the date_histogram level, like this:

    {
        "query": {...}
        "aggregations": {
            "by_day": {
                "date_histogram": {
                    "field": "day",
                    "interval": "day"
                },
                "aggs": {
                    "page_views_per_day": {
                        "sum": {
                            "field": "page_views"
                        }
                    },
                    "clicks_per_day": {
                        "sum": {
                            "field": "clicks"
                        }
                    }
                }
            }
        }
    }
    

    UPDATE

    Since your queries are different for each of your aggregations, we need to do it slightly differently, i.e. by using an additional filters aggregation, like this:

    {
      "size": 0,
      "query": {
        "filtered": {
          "filter": {
            "bool": {
              "must": [
                {
                  "terms": {
                    "event": [
                      "page_view",
                      "struct"
                    ]
                  }
                }
              ],
              "should": {
                "term": {
                  "se_action": "click"
                }
              },
              "must_not": {
                "term": {
                  "br_family": "Robot"
                }
              }
            }
          }
        }
      },
      "aggs": {
        "dates": {
          "date_histogram": {
            "field": "collector_tstamp",
            "interval": "day"
          },
          "aggs": {
            "my_filters": {
              "filters": {
                "filters": {
                  "page_views_filter": {
                    "bool": {
                      "must": [
                        {
                          "term": {
                            "event": "page_view"
                          }
                        }
                      ],
                      "must_not": {
                        "term": {
                          "br_family": "Robot"
                        }
                      }
                    }
                  },
                  "clicks_filter": {
                    "bool": {
                      "must": [
                        {
                          "term": {
                            "event": "struct"
                          }
                        },
                        {
                          "term": {
                            "se_action": "click"
                          }
                        }
                      ],
                      "must_not": {
                        "term": {
                          "br_family": "Robot"
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    

    Now for each daily bucket, you're going to end up with two sub-buckets, one for the count of page views and another for the count of clicks.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog