BigTable：一个大查询还是一打小查询？

I store series of events in BigTable with the form of:

rowKey                | col_1 | col_2
----------------------|-------|------
uuid1!uuid2!timestamp | val1  | val2
....

col_1 holds a float64 and col_2 holds a string 63 characters long.

Specific ranges within this series of events are grouped and are loosely associated to an object we'll call an operation:

{
    "id": 123,
    "startDate": "2019-07-15T14:02:12.335+02:00",
    "endDate": "2019-07-15T14:02:16.335+02:00"
}

So you may say that an operation is a timewindow of events, and may be associated to 10-1000 events.

When I want to display this data to the user, I first query the operation objects, and then I execute a BigTable query for each operation to find the events it covers.

Through monitoring I've discovered that each BigTable (a development instance, mind you) query may take between 20ms to 300ms.

This got me wondering, given BigTable's architecture - does it make sense to execute small, individual queries?

Does it make more sense to execute one big query that covers my range of operations, then divide the events to their respective operations in my application?

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanjing9339 2019-08-22 16:39
关注
Most likely yes, but the details matter here.

If there are only a few operations per user request then it may actually be better to issue the small queries in parallel. This will get you the best possible latency per request, at the expense of some per-request CPU overhead for your cluster. Your application code will also be more complicated.

If there are lots of operations per user request, you'll definitely want the increased throughput efficiency that you get from scanning.

For an advanced use case you could also compromise between the two and break the scan into N shards which you run in parallel, where N << #operations.

The one thing you definitely shouldn't do is send the small requests one at a time, as you'll just produce a bunch of unnecessary round trips!

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Bigtable：一个分布式的结构化数据存储系统
2019-07-27 01:08

Bigtable的设计理念是为了解决海量数据的存储与查询问题，它提供了一个高度可扩展、高可用性和高性能的数据存储解决方案。在Bigtable中，数据被组织成表格形式，每个表格由行和列组成，行和列的组合称为"行键"...
Bigtable：一个分布式的结构化数据存储系统[中文版] pdf
2011-06-30 11:23

综上所述，Bigtable作为一个分布式的大规模数据存储系统，在Google的众多产品和服务中扮演着至关重要的角色。它的设计和实现不仅解决了海量数据处理的问题，还为未来的分布式系统提供了有价值的参考和启示。
【论文阅读笔记】Bigtable: A Distributed Storage System for Structured Data
2024-10-13 23:16

HeZephyr的博客 Bigtable 是 Google 设计的用于管理结构化数据的分布式存储...Bigtable被 60 多个 Google 产品使用，涵盖不同数据规模和延迟要求的应用，这得益于 BigTable 提供的简单数据模型可以使客户端动态控制数据的布局和格式。
004.精读《Bigtable: A Distributed Storage System for Structured Data》
2024-11-03 21:51

大数据精读周刊的博客综上所述，Bigtable论文在大数据领域产生了深远的影响，其设计理念也影响了开源社区，它不仅促进了NoSQL数据库的兴起，还对许多后来的大数据技术组件产生了直接的启发，包括HBase、MongoDB和Spark等。更重要的是，不...
Bigtable： A Distributed Storage System for Structured Data
2024-06-15 11:36

阿奴波仔的博客是一种分布式结构化数据存储管理系统，存储量级是PB级别。存储的数据类型和延时要求差异都很大。论文介绍数 bigtable 的数据模型。
Google三大论文（一）——Bigtable：一个分布式的结构化数据存储系统
2018-08-13 21:59

Alexwym的博客这是某个大神翻译的中文版论文资料。在重新阅读这篇论文的时候，本想对其进行一些提取总结，没想到越写越多。里面涉及到了较多的专业名词以及抽象概念，我加了...一、应用场景 1.处理海量数据：通常是分布在数千...
Google三大论文（一）BigTable:一个分布式的结构化数据存储系统
2018-08-15 15:23

沼泽鱼97的博客 Bigtable是一个分布式的结构化数据存储系统，它被设计用来处理海量数据：通常是分布在数千台普通服务器上的PB级的数据。Google的很多项目使用Bigtable存储数据，包括Web索引、Google Earth、 Google Finance。这些...
谷歌三大论文之--Bigtable：一个分布式的结构化数据存储系统
2017-10-15 04:20

可克的博客 Bigtable：一个分布式的结构化数据存储系统译者：alex 摘要 Bigtable是一个分布式的结构化数据存储系统，它被设计用来处理海量数据：通常是分布在数千台普通服务器上的PB级的数据。Google的很多项目...
bigtable-sql:分布式大数据SQL查询可视化界面！
2021-05-06 05:49

Home访问我的,获取更多大数据/云计算的技术文章！...提供的bigtable-sql-3.5.0.zip安装包,自带一个jdk1.8,即使系统安装了其他版本jdk或者没有安装jdk也能完美运行！如果你使用了Presto,可体验请戳！编译bigt
Bigtable：A Distributed Storage System for Structured Data
2014-01-04 18:04

在描述中明确指出，Bigtable并非关系型数据库，而是一个巨大的、结构化数据存储的表格系统。结构化数据是指那些存储在固定字段中的数据，可以很容易地被查询和处理，如XML和JSON格式的数据。分布式数据库系统则是指...
Bigtable: A Distributed Storage System for Structured Data_中文翻译
2021-06-21 20:40

冷冽的火花的博客 Bigtable: A Distributed Storage System for ...Bigtable 是一个分布式存储系统，用于管理结构化数据，并被设计成可以扩展到非常大的规模：跨越数千台商品服务器的 PB 级数据。Google 的许多项目都将数据存储在 Bigtab
【转】Bigtable：一个分布式的结构化数据存储系统
2015-01-17 00:02

MonkeyBowen的博客本文的英文原文为Google在2006年发布的Google Bigtable paper ... 这是我很长时间以来一直想要翻译的文章,不过由于其文太长,以及本人精力有限,未能如愿,今天偶遇此文,感觉译者此文的翻译已远远超越...Bigtable：一个
BigTable：结构化数据的分布式存储系统
2019-09-01 20:44

涛声依旧（竞涛）的博客 Bigtable是一个用于管理结构化数据的分布式存储系统，旨在扩展到非常大的规模：数千个商用服务器上的数PB数据。Google的许多项目都在Bigtable中存储数据，包括网络索引，Google Earth和Google Finance。这些应用对...
Bigtable: A Distributed Storage System for Structured Data
2011-04-20 01:13

例如，Bigtable并没有提供SQL查询语言这样的功能，而是通过一个简单的数据模型给予客户端动态控制数据布局和格式的能力。这使得Bigtable能够在满足高性能和大规模数据处理的同时，保持灵活性和易用性。 #### 系统...
Bigtable:一个结构化数据的分布式存储系统
2013-07-29 17:07

WhuCS_B701的博客 Bigtable是一个为管理大规模结构化数据而设计的分布式存储系统,这些大规模数据是分布在上千台普通服务器的PB级数据。Google的许多项目使用Bigtable存储数据，包括Web索引，Google Earth以及GoogleFinance。这些应用...
Bigtable: A Distributed Storage System for Structured Data (译)
2020-02-27 16:46

Hertz--的博客 Bigtable: A Distributed Storage System for Structured Data (译) 转载请注明：...作者 phylips@bmy 摘要 Bigtable是设计用来管理那些可能达到很大大小(比如可能是存储在数千台服务器上的数PB的数据)的...
BigTable：一个针对结构化数据的分布式存储系统----论文摘要
2023-07-17 08:33

不动明王1984的博客 Bigtable是一个分布式的结构化数据存储系统，它被设计用来处理海量数据：通常是分布在数千台普通...尽管应用需求差异很大，但是，针对Google的这些产品，Bigtable还是成功的提供了一个灵活的、高性能的解决方案。本论
Bigtable：结构化数据的分布式存储系统（一）
2019-05-30 09:03

weixin_33877885的博客译自Bigtable_A Distributed Storage System for Structured Data 作者：Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C.Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E....
没有解决我的问题, 去提问

BigTable：一个大查询还是一打小查询？

1条回答 默认 最新

1条回答默认最新