dongrong1856
2019-02-04 12:06
浏览 48
已采纳

键/值数据库的二级索引

Lets say, I have data structure like

 type User struct {
      UUid string 
      Username string
      Email String 
      Password string 
      FirstName string 
      LastName string
}

I am storing Users []User into a key/value database in levelDB. The unique key will be UUid and then user struct will be endoed and stored against this UUID.

var network bytes.Buffer // Stand-in for a network connection
enc := gob.NewEncoder(&network)
err := enc.Encode(user)
   if err != nil {
      log.Println("Error in encoding gob")
      return "", err
 }
err = dbSession.DBSession.Put([]byte(user.UserID), network.Bytes(), nil)

Since the key for all the entries is the unique uuid, I want to make a secondary index on email so that I dont necessarily have to scan all the entries present in the database to find a particular entry corresponding to an Email.

What I have Done: I have created a key called as SIndex and stored a map[string][string] data structure in it, where a key will be an email and value will be the uuid. Every time a new entry comes in, This Sindex will be updated to acommodate the new uuid and email.

Its a bad approach: Because as data grows, Whole map corresponding to Sindex needs to be fetched and decoded, If email doesn't exists, add a new key to Sindex, encode it and store back again.

A B-tree would be a better fit.

My question : Is it right to store secondary index data in the Database itself, if not what strategies shall I use to implement a secondary Index, I know the choice of secondary index greatly influenced by the data but Are there any good out of box indexing algorithms other than B-Tree, HashMaps?

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • dp20011 2019-02-06 09:00
    已采纳

    Is it right to store secondary index data in the Database itself

    Yes, this is okay. But as pointed out by Jonas in the comment, you should put the email as key and UUID as value. Another option is to use email as the key for your database instead of using UUID. This way you don't need to use a secondary index.

    Another strategy for better performance, you can use in-memory databases such as Redis (or maybe LevelDB itself can be used to store the data in memory) to store the secondary index (email as key and UUID as value).

    Are there any good out of box indexing algorithms other than B-Tree, HashMaps

    Anyway, B-Tree and HashMap are data structures, not algorithms. And what you did actually is not indexing with HashMap, it's just storing HashMaps as values for your key. Indexing usually depends on the DBMS implementation (we can only choose from the options they provided).

    So, about the data structures used for indexing, whether it's good or not, really depends on the use cases. For example, if you need to do range search you can use B-Tree (used by default by most of the DBMSs), B+ tree (used by default by MySQL InnoDB), and Skip List (Redis use this data structure for its Sorted Set). You can read more about secondary indexing with Redis Sorted Set here.

    And for your case, you only need to store email as key and UUID as value. Hash Table is commonly used for this. Most of the DBMSs use this data structure to do primary key access with just O(1) time complexity. And I believe LevelDB implementation is also based on this data structure.

    打赏 评论

相关推荐 更多相似问题