BigTable：一个大查询还是一打小查询？

I store series of events in BigTable with the form of:

rowKey                | col_1 | col_2
----------------------|-------|------
uuid1!uuid2!timestamp | val1  | val2
....

col_1 holds a float64 and col_2 holds a string 63 characters long.

Specific ranges within this series of events are grouped and are loosely associated to an object we'll call an operation:

{
    "id": 123,
    "startDate": "2019-07-15T14:02:12.335+02:00",
    "endDate": "2019-07-15T14:02:16.335+02:00"
}

So you may say that an operation is a timewindow of events, and may be associated to 10-1000 events.

When I want to display this data to the user, I first query the operation objects, and then I execute a BigTable query for each operation to find the events it covers.

Through monitoring I've discovered that each BigTable (a development instance, mind you) query may take between 20ms to 300ms.

This got me wondering, given BigTable's architecture - does it make sense to execute small, individual queries?

Does it make more sense to execute one big query that covers my range of operations, then divide the events to their respective operations in my application?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanjing9339 2019-08-22 16:39
关注
Most likely yes, but the details matter here.

If there are only a few operations per user request then it may actually be better to issue the small queries in parallel. This will get you the best possible latency per request, at the expense of some per-request CPU overhead for your cluster. Your application code will also be more complicated.

If there are lots of operations per user request, you'll definitely want the increased throughput efficiency that you get from scanning.

For an advanced use case you could also compromise between the two and break the scan into N shards which you run in parallel, where N << #operations.

The one thing you definitely shouldn't do is send the small requests one at a time, as you'll just produce a bunch of unnecessary round trips!

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

BigTable：一个大查询还是一打小查询？
2019-08-22 15:15

回答 1 已采纳 Most likely yes, but the details matter here. If there are only a few operations per user request
如何从GoLang应用程序连接到Bigtable Emulator？如何使用它？
2018-06-17 15:06

回答 1 已采纳 The BIGTABLE_EMULATOR_HOST environment variable overrides the normal connection logic. Having it s
如何处理每小时Bigtable连接关闭？
2019-03-03 15:31

回答 2 已采纳 Cloud bigtable clients use a pool of gRPC connections to connect to bigtable. Java client uses a c
Google三大论文（一）BigTable:一个分布式的结构化数据存储系统
2018-08-15 15:23

沼泽鱼97的博客 Bigtable是一个分布式的结构化数据存储系统，它被设计用来处理海量数据：通常是分布在数千台普通服务器上的PB级的数据。Google的很多项目使用Bigtable存储数据，包括Web索引、Google Earth、 Google Finance。这些...
如何使用bigtable Go客户端支持分页？
2018-12-18 01:59

回答 1 已采纳 You can start a ReadRows operation from userId#startTime to userId#endTime with a NewRange and set
Golang ListenUDP使用BigTable连接阻止多个端口
2017-07-02 19:57

回答 1 已采纳 In your playground example, you're using for {} to keep the server running for forever. This seems
使用Go进行Google Cloud Bigtable身份验证
2015-12-14 09:07

回答 2 已采纳 I've solved the problem. It's nothing wrong with the code, but config json itself. So anyone who o
谷歌三大论文之--Bigtable：一个分布式的结构化数据存储系统
2017-10-15 04:20

可克的博客 Bigtable：一个分布式的结构化数据存储系统译者：alex 摘要 Bigtable是一个分布式的结构化数据存储系统，它被设计用来处理海量数据：通常是分布在数千台普通服务器上的PB级的数据。Google的很多项目...
Google更改软件包的网址时该怎么办？
2016-10-06 05:08

回答 1 已采纳 If any code of yours is using the google cloud API packages directly, update them to the new URL.
如何使用lib / pq驱动程序插入NUMERIC字段类型？ postgresql
2018-02-05 23:09

回答 1 已采纳 First of all, you should use numeric placeholders ($1, $2, ...) with PostgreSQL since that's what
错误：（gcloud.preview.app）无效的选择：“运行”
2016-02-21 15:54

回答 1 已采纳 app run is no longer part of the gcloud SDK as can be seen under all the possible command trees:
bigtable-sql:分布式大数据SQL查询可视化界面！
2021-05-06 05:49

Home访问我的,获取更多大数据/云计算的技术文章！...提供的bigtable-sql-3.5.0.zip安装包,自带一个jdk1.8,即使系统安装了其他版本jdk或者没有安装jdk也能完美运行！如果你使用了Presto,可体验请戳！编译bigt
从PHP中选择数据时，MySQL表被锁定了多长时间？ database mysql php
2012-09-03 12:48

回答 2 已采纳 The lock is released at the end of your read query: $result = mysql_query($qry, $dbConnection);
Bigtable：一个分布式的结构化数据存储系统
2015-01-27 18:41

代立冬的博客 Bigtable：一个分布式的结构化数据存储系统本文的英文原文为Google在2006年发布的Google Bigtable paper 本文的翻译版本由Alex完成,原文地址为: http://blademaster.ixiezi.com/ 这是我很长时间以来...
Google三大论文（一）——Bigtable：一个分布式的结构化数据存储系统
2018-08-13 21:59

Alexwym的博客这是某个大神翻译的中文版论文资料。在重新阅读这篇论文的时候，本想对其进行一些提取总结，没想到越写越多。里面涉及到了较多的专业名词以及抽象概念，我加了...一、应用场景 1.处理海量数据：通常是分布在数千...
Bigtable: A Distributed Storage System for Structured Data_中文翻译
2021-06-21 20:40

冷冽的火花的博客 Bigtable: A Distributed Storage System for ...Bigtable 是一个分布式存储系统，用于管理结构化数据，并被设计成可以扩展到非常大的规模：跨越数千台商品服务器的 PB 级数据。Google 的许多项目都将数据存储在 Bigtab
【转】Bigtable：一个分布式的结构化数据存储系统
2015-01-17 00:02

MonkeyBowen的博客本文的英文原文为Google在2006年发布的Google Bigtable paper ... 这是我很长时间以来一直想要翻译的文章,不过由于其文太长,以及本人精力有限,未能如愿,今天偶遇此文,感觉译者此文的翻译已远远超越...Bigtable：一个
BigTable：结构化数据的分布式存储系统
2019-09-01 20:44

涛声依旧（竞涛）的博客 Bigtable是一个用于管理结构化数据的分布式存储系统，旨在扩展到非常大的规模：数千个商用服务器上的数PB数据。Google的许多项目都在Bigtable中存储数据，包括网络索引，Google Earth和Google Finance。这些应用对...
没有解决我的问题, 去提问

悬赏问题

¥15 smptlib使用465端口发送邮件失败
¥200 总是报错，能帮助用python实现程序实现高斯正反算吗？有偿
¥15 对于squad数据集的基于bert模型的微调
¥15 为什么我运行这个网络会出现以下报错？CRNN神经网络
¥20 steam下载游戏占用内存
¥15 CST保存项目时失败
¥15 树莓派5怎么用camera module 3啊
¥20 java在应用程序里获取不到扬声器设备
¥15 echarts动画效果的问题，请帮我添加一个动画。不要机器人回答。
¥15 Attention is all you need 的代码运行

BigTable：一个大查询还是一打小查询？

1条回答 默认 最新

悬赏问题

1条回答默认最新