键/值数据库的二级索引

Lets say, I have data structure like

 type User struct {
      UUid string 
      Username string
      Email String 
      Password string 
      FirstName string 
      LastName string
}

I am storing Users []User into a key/value database in levelDB. The unique key will be UUid and then user struct will be endoed and stored against this UUID.

var network bytes.Buffer // Stand-in for a network connection
enc := gob.NewEncoder(&network)
err := enc.Encode(user)
   if err != nil {
      log.Println("Error in encoding gob")
      return "", err
 }
err = dbSession.DBSession.Put([]byte(user.UserID), network.Bytes(), nil)

Since the key for all the entries is the unique uuid, I want to make a secondary index on email so that I dont necessarily have to scan all the entries present in the database to find a particular entry corresponding to an Email.

What I have Done: I have created a key called as SIndex and stored a map[string][string] data structure in it, where a key will be an email and value will be the uuid. Every time a new entry comes in, This Sindex will be updated to acommodate the new uuid and email.

Its a bad approach: Because as data grows, Whole map corresponding to Sindex needs to be fetched and decoded, If email doesn't exists, add a new key to Sindex, encode it and store back again.

A B-tree would be a better fit.

My question : Is it right to store secondary index data in the Database itself, if not what strategies shall I use to implement a secondary Index, I know the choice of secondary index greatly influenced by the data but Are there any good out of box indexing algorithms other than B-Tree, HashMaps?

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dp20011 2019-02-06 09:00
关注
Is it right to store secondary index data in the Database itself

Yes, this is okay. But as pointed out by Jonas in the comment, you should put the email as key and UUID as value. Another option is to use email as the key for your database instead of using UUID. This way you don't need to use a secondary index.

Another strategy for better performance, you can use in-memory databases such as Redis (or maybe LevelDB itself can be used to store the data in memory) to store the secondary index (email as key and UUID as value).

Are there any good out of box indexing algorithms other than B-Tree, HashMaps

Anyway, B-Tree and HashMap are data structures, not algorithms. And what you did actually is not indexing with HashMap, it's just storing HashMaps as values for your key. Indexing usually depends on the DBMS implementation (we can only choose from the options they provided).

So, about the data structures used for indexing, whether it's good or not, really depends on the use cases. For example, if you need to do range search you can use B-Tree (used by default by most of the DBMSs), B+ tree (used by default by MySQL InnoDB), and Skip List (Redis use this data structure for its Sorted Set). You can read more about secondary indexing with Redis Sorted Set here.

And for your case, you only need to store email as key and UUID as value. Hash Table is commonly used for this. Most of the DBMSs use this data structure to do primary key access with just O(1) time complexity. And I believe LevelDB implementation is also based on this data structure.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

ClickHouse MergeTree二级索引/跳数索引
2022-04-03 16:38

大白兔黑又黑的博客在前一篇文章《ClickHouse MergeTree表引擎和建表语句》中，我们详细介绍了MergeTree的建表语句、存储结构和索引原理，本篇我们继续介绍MergeTree的另一个特性——二级索引，二级索引适用于所有MergeTree家族表引擎...
数据库范式化与大数据：处理海量数据的特殊考虑
2025-07-22 00:30

光子AI的博客【代码】数据库范式化与大数据：处理海量数据的特殊考虑。
大数据场景下时序数据库选型指南，Apache IoTDB的领先技术和实践
2025-09-17 16:36

Lion 莱恩呀的博客摘要本文探讨了时序数据库（TSDB）在数字经济时代的重要性，重点分析了Apache IoTDB的核心优势。文章首先指出物联网设备产生的海量时序数据对传统数据库的挑战，强调TSDB在高效存储、快速查询和实时分析方面的专业...
大数据生态数据库技术选型
2023-05-18 15:44

猿来如此dj的博客 大数据生态数据库特性分析。
数据库索引详解
2021-11-20 11:03

Iobliviate的博客 数据库索引，是数据库管理系统中一个排序的数据结构，以协助快速查询，更新数据库中表的数据。索引的实现通常使用B树和变种的B+树（MySQL常用的索引就是B+树）。除了数据之外，数据库系统还维护为满足特定查找算法的...
数据库进阶：2.索引
2024-09-09 16:07

沐晓.的博客索引分类主键索引、唯一索引、常规索引、全文索引聚集索引、二级索引4.索引语法5.SQL性能分析执行频次、慢查询日志、profile、 explain6.索引使用联合索引索引失效SQL提示覆盖索引前缀索引单列/联合索引7.索引设计...
Phoenix二级索引原理与代码实例讲解
2024-06-08 00:48

光子AI的博客 Phoenix二级索引原理与代码实例讲解 1.背景介绍在大数据时代,海量数据的高效查询和分析是一个巨大的挑战。Apache Phoenix作为构建在HBase之上的关系型数据库层,提供了类SQL的查询接口,大大简化了HBase的使用。而...
大数据面试题 —— 数据库
2024-05-09 18:37

夏木夕的博客这是因为在OR条件中，如果其中一个条件的选择性很低，即满足该条件的记录数量很大，而另一个条件的选择性较高，即满足该条件的记录数量较少，数据库引擎可能会选择放弃使用索引，而进行全表扫描，以避免在索引中进行...
上帝视角Hbase二级索引方案全解析
2021-11-07 13:12

王知无(import_bigdata)的博客点击上方蓝色字体，选择“设为星标”回复”面试“获取更多惊喜什么是二级索引Coprocessor协处理器类型Coprocessor方案(Phoenix等）Phoenix二级索引特点非Copr...
没有解决我的问题, 去提问

键/值数据库的二级索引

1条回答 默认 最新

1条回答默认最新