dongrenshi0889 2014-08-04 18:44
浏览 35

去:没有哈希表(又名映射)的有效的字符串查找?

I have a problem in Golang whereby I need to be able to lookup string keys, from about 5,000,000 strings, each of which containing only a-z (lowercase) and 0-9 characters. Similar problem with uint32 and uint64 as keys.

A map (hash table) is perfect for this, but it uses much too much RAM.

There must be known methods for this type of thing, I've been looking into B-Tree but I'm not sure it would be the most efficient mechanism.

Some of the particularities of my problem, which could lead to a more efficient solution, are:

  1. The keys need only be strings of a-z0-9 or simple uint values.
  2. Once built it only needs be read-only.

Seeing as it only needs be read-only then it seems to me that having it as a pre-sorted list with a series of indexes, might work well. I thought at first I might be able to just have it in slices with a 36 (26 letters + 10 numbers) index for each level (i.e. character)... but of course that means 36^whatever which ends up being the opposite of efficient. Then I thought maybe I could put only a single index of 36 for each level, but then I end up with a load of arrays/slices that need to be intersected to get the ID of the result.

I guess I'm looking for some kind of very specific B-Tree implementation, but more tuned to my purpose (without the B.)

Does anyone know of anything that exists like I am suggesting?

  • 写回答

4条回答 默认 最新

  • douchu5131 2014-08-04 18:56
    关注

    I'd give a try to a Compressed Trie. It's the data structure perfectly usable in a scenario with lexicographic keys. B-Trees are mostly intended for external memories because they're minimizing the depth of a tree. A trie or a more memory efficient hashing is the right way to go.

    评论

报告相同问题?

悬赏问题

  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示