douyu2817 2019-08-22 15:15
浏览 49
已采纳

BigTable:一个大查询还是一打小查询?

I store series of events in BigTable with the form of:

rowKey                | col_1 | col_2
----------------------|-------|------
uuid1!uuid2!timestamp | val1  | val2
....

col_1 holds a float64 and col_2 holds a string 63 characters long.

Specific ranges within this series of events are grouped and are loosely associated to an object we'll call an operation:

{
    "id": 123,
    "startDate": "2019-07-15T14:02:12.335+02:00",
    "endDate": "2019-07-15T14:02:16.335+02:00"
}

So you may say that an operation is a timewindow of events, and may be associated to 10-1000 events.

When I want to display this data to the user, I first query the operation objects, and then I execute a BigTable query for each operation to find the events it covers.

Through monitoring I've discovered that each BigTable (a development instance, mind you) query may take between 20ms to 300ms.

This got me wondering, given BigTable's architecture - does it make sense to execute small, individual queries?

Does it make more sense to execute one big query that covers my range of operations, then divide the events to their respective operations in my application?

  • 写回答

1条回答 默认 最新

  • duanjing9339 2019-08-22 16:39
    关注

    Most likely yes, but the details matter here.

    If there are only a few operations per user request then it may actually be better to issue the small queries in parallel. This will get you the best possible latency per request, at the expense of some per-request CPU overhead for your cluster. Your application code will also be more complicated.

    If there are lots of operations per user request, you'll definitely want the increased throughput efficiency that you get from scanning.

    For an advanced use case you could also compromise between the two and break the scan into N shards which you run in parallel, where N << #operations.

    The one thing you definitely shouldn't do is send the small requests one at a time, as you'll just produce a bunch of unnecessary round trips!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 smptlib使用465端口发送邮件失败
  • ¥200 总是报错,能帮助用python实现程序实现高斯正反算吗?有偿
  • ¥15 对于squad数据集的基于bert模型的微调
  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 CST保存项目时失败
  • ¥15 树莓派5怎么用camera module 3啊
  • ¥20 java在应用程序里获取不到扬声器设备
  • ¥15 echarts动画效果的问题,请帮我添加一个动画。不要机器人回答。
  • ¥15 Attention is all you need 的代码运行