douyu2817 2019-08-22 15:15
浏览 49
已采纳

BigTable:一个大查询还是一打小查询?

I store series of events in BigTable with the form of:

rowKey                | col_1 | col_2
----------------------|-------|------
uuid1!uuid2!timestamp | val1  | val2
....

col_1 holds a float64 and col_2 holds a string 63 characters long.

Specific ranges within this series of events are grouped and are loosely associated to an object we'll call an operation:

{
    "id": 123,
    "startDate": "2019-07-15T14:02:12.335+02:00",
    "endDate": "2019-07-15T14:02:16.335+02:00"
}

So you may say that an operation is a timewindow of events, and may be associated to 10-1000 events.

When I want to display this data to the user, I first query the operation objects, and then I execute a BigTable query for each operation to find the events it covers.

Through monitoring I've discovered that each BigTable (a development instance, mind you) query may take between 20ms to 300ms.

This got me wondering, given BigTable's architecture - does it make sense to execute small, individual queries?

Does it make more sense to execute one big query that covers my range of operations, then divide the events to their respective operations in my application?

  • 写回答

1条回答 默认 最新

  • duanjing9339 2019-08-22 16:39
    关注

    Most likely yes, but the details matter here.

    If there are only a few operations per user request then it may actually be better to issue the small queries in parallel. This will get you the best possible latency per request, at the expense of some per-request CPU overhead for your cluster. Your application code will also be more complicated.

    If there are lots of operations per user request, you'll definitely want the increased throughput efficiency that you get from scanning.

    For an advanced use case you could also compromise between the two and break the scan into N shards which you run in parallel, where N << #operations.

    The one thing you definitely shouldn't do is send the small requests one at a time, as you'll just produce a bunch of unnecessary round trips!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么