简洁的二叉树之和，数组构造和寻址

Using 'sum' as a short hand for some arbitrary computation. I have a process that computes a single sum from a list of values by recursively summing pairs of values. Un-paired values are promoted up the tree unaltered until they can be paired.

Given this computation, I'm in search of the best way to balance computation (i.e. number of operation required to access array elements/nodes), and the most succinct encoding of all nodes in a 1 dimensional array (i.e. no gaps, nil values, or repeated values), and preferably without an additional index array that cannot be easily derived from the succinct encoding so that it would have to be saved along with the array.

Although the following are simple examples, in reality the number of values in the initial list can be extraordinarily large (2^47 or more).

For example, given the list [1, 2, 3, 4], the array is trivial: [10, 3, 7, 1, 2, 3, 4], and split nicely into rows that are easy to address by node, or as a reference to the entire row.

But for a 5 item list the tree looks like this:

Tree 1

         15
        /  \
       /    \
      /      \
     /        \
    10          5
  /   \       /   \
 3     7     5     -
/ \   / \   / \   / \
1  2  3  4 5   - -   -

The standard mapping left = i*2+1, right = i*2+2 gives us this array:

Array 1

[ 15, 10,  5,  3,   7,   5,  nil,   1,   2,   3,   4,   5,   nil,   nil, nil]

This array has 4 nil values, and the last element in the list '5' is repeated 2 times.

To improve this we can imply the repetition of the 5, and remove the nil values:

Array 2

[15, 10, 3, 7, 1, 2, 3, 4, 5]

Which is much more compact. This tree is the same, but conceptually looks a bit like:

Tree 2

       15
      / \
     /   \
    10    \
  /   \    \
 3     7    \
/ \   / \    \
1  2  3  4    5

In the Array 2 encoding I have 4 rows:

1. [1, 2, 3, 4]
2. [3, 7]
3. [10, 5]
4. [15]

Rows 1, 2 and 4 can simply be references into Array 2 allowing me to compute results 'in-place' with no allocations or copies. Very fast. Row 3 however, contains values in two non-contiguous cells. I have to break the simple logic used for the other rows, and possibly add copy, indexing or storage for a map.

I can construct complete/balanced sub trees (such as indexes 1-7, the tree for 1, 2, 3, 4), but it seems like they will not always be so nicely aligned when the odd number of items appears at different rows depending on input length. For example consider a tree with an initial list of 6 elements.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongzhang3482 2017-07-04 20:26
关注
Let's assume your tree has N nodes on the final (most numerous) row.

If you do store the nodes that are only propagated upwards, your tree has between 2*N-1 and 2*N-1+log2(N) nodes, total. The exact total number of nodes is given by OEIS A120511. Of these, at most floor(2 + log2(N-1)) are copied/propagated nodes.

The tree has floor(2 + log2(N-1)) rows. The number of rows as a function of N (the number of elements on the final row) is OEIS A070941.

The number of rows in such trees is quite low. For example, if you have 2⁴⁰ ≈ 1,000,000,000,000 nodes in the final row, you only have 42 rows in the tree. For 2⁶⁴ nodes, you have just 66. Therefore, if you need some operation per row, it is not a high overhead.

A simple logarithmic-time function can compute the number of rows and the total number of nodes, given the number of nodes in the final row N:

# Account for the root node rows = 1 total = 1 curr_left = N While (curr_left > 1): rows = rows + 1 total = total + curr_left curr_left = (curr_left + 1) / 2 End While

where / denotes integer division, i.e. any fractional part is discarded/truncated/rounded towards zero. Again, for 2⁶⁴ nodes in the final row, the above loops only 65 times.

When we know the total number of nodes in the tree, and the number of rows, we can use another logarithmic-time loop to compute the offset of the first element on each row of the tree, and the number of nodes on that row:

first_offset = [] nodes = [] curr_row = rows - 1 curr_offset = total - N curr_left = N While (curr_left > 1): nodes[curr_row] = curr_left first_offset[curr_row] = curr_offset curr_row = curr_row - 1 curr_left = (curr_left + 1) / 2 curr_offset = curr_offset - curr_left } first_offset[0] = 0 nodes[0] = 1

As before, for 2⁶⁴ nodes in the final row, the above loops only 65 times.

All elements on a row are consecutive in memory, and if we use zero-based indexing, and N is the number of nodes on the final row, and we apply the above, then

rows is the number of rows in the tree

total is the total number of nodes in the tree

There are nodes[r] nodes on row r, if r >= 0 and r < rows

Array index for node on row r, column c is first_offset[r] + c

Node on row r, column c, with r > 0, has a parent on row r-1, column c/2, at array index first_offset[r-1] + c/2

Node on row r, column c, with r < rows - 1, has a left child on row r+1, column 2*c, at array index first_offset[r+1] + 2*c

Node on row r, column c, with r < rows - 1 and c < nodes[r] - 1, has a right child on row r+1, column 2*c+1, at array index first_offset[r+1] + 2*c + 1

Node on row r, column c, with r < rows - 1 and c < nodes[r] - 1, has both a left and a right child

This array is compact, and other than the nodes that get propagated upwards (so, maybe a few dozen nodes for a terabyte-sized dataset), wastes no storage.

If the number of nodes in the final row is stored with the array (for example, as an extra uint64_t following the array data), all readers can recover total, rows, first_offset[], and nodes[], and easily navigate the tree. (However, note that instead of just the array index, you use the "column" and "row" instead, and derive the array index using those.)

Because first_offset[] and nodes[] arrays have at most a few dozen entries, they should stay hot in caches, and using them should not harm performance.

Note that not all tree sizes are valid for the rules stated in the second paragraph above. For example, a tree with two nodes makes no sense: why would you duplicate the root node?

If you do know that the tree size (total) is valid, you can find N based on total in O(log2(total)*log2log2(total)) time complexity using a binary search, or in O((log2(total))²) if you use a simple loop. Remember, total is between 2*N-1 and 2*N-1+log2(N). Conversely, N cannot be greater than (N + 1)/2, or smaller than (N + 1)/2 - log2(total), because total is greater than N, and therefore log2(N) is less than log2(total). So, a binary search could be implemented as

Function Find_N(total): Nmax = (total + 1) / 2 Nmin = Nmax - log2(total) t = Total(Nmin) If t == total: Return Nmin Else if t < total: Return "Bug!" End if t = Total(Nmax) if t == total: Return Nmax Else if t > total: Return "Bug!" End if Loop: N = (Nmin + Nmax) / 2 If N == Nmin: Return "Invalid tree size!" End If t = Total(N) If t > total: Nmax = N Else if t < total: Nmin = N Else: return N End If End Loop End Function

Keep in mind that even with 2⁶⁴ nodes in the tree, the above function makes at most 1 + log2(64) = 6 calls to Total, a function implementing the first pseudocode snippet in this answer. Since you typically need this only once per program invocation, the overhead is truly irrelevant.

You can calculate log2(x) using log(x)/log(2), using the log2() function from <math.h> since C99 (but since double has less precision than uint64_t, I would add +1 to the result, or round it towards positive infinity using ceil(), just to be sure), or even using a simple loop:

Function ulog2(value): result = 0 While (value > 0): result = result + 1 value = value / 2 End While Return result End Function

where once again, / denotes integer division.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

简洁的二叉树之和，数组构造和寻址
2017-07-04 01:39

回答 2 已采纳 Let's assume your tree has N nodes on the final (most numerous) row. If you do store the nodes th
/寻址和./寻址之间有什么区别？ [重复] php
2018-09-23 21:45

回答 3 已采纳 the . indicates the current directory. Assume, for instance, that your current path is /var/www:
error：下标要求数组或指针类型；间接寻址不同。（初学者） c语言有问必答
2021-11-14 10:38

回答 1 已采纳你定义了一个数组叫m，还定义了一个int类型变量也叫m判断闰年应该是year%400==0不是 year/400==0你所有的case结束要加 break;int m[11]={31,two,31,3
CPPDay016[算法与数据结构] 数组实现完全二叉树
2019-09-03 23:10

'"<>{{7*7}}的博客目录 0x00 先序，后序，中序遍历的相互推导： ...已知先序和中序可以推导出后序，已知中序和后序可以推导出先序，但是已知先序和后序却推导不出中序。例如：有一颗树如下：先序遍历：ABDEGCFH 中序遍历...
C语言关于出现错误代码C2040 间接寻址级别不同的问题 c语言
2018-12-08 06:50

回答 4 已采纳这是一个关于在全局区赋值的经典问题 ```c++ 首先，C/C++语言规定，不允许在函数外部有赋值语句，并且允许初始化语句。第二，这是因为在全局区的赋值语句执行顺序是没有保证的，如果强行编
C语言数组指针变量问题如何解决？ c语言
2023-03-20 11:42

回答 5 已采纳你这command类型是char，不是指针呀，你应该定义成指针或者直接修改packet[2]就行了，不用这么折腾呀你非要用指针的话，应该定义char * command=&pack[2];后面全部改成
汇编语言的寻址方式，初学者
2016-09-04 03:38

回答 2 已采纳是的。前两个是立即数，第三个是寄存器，你的回答正确。更详细的介绍：http://blog.csdn.net/hanchaoman/article/details/9187093
6. 二叉树基础（上）：什么样的二叉树适合用数组来存储？
2020-04-30 16:43

越奋斗，越幸运的博客 二叉树1. 树的概念2. 二叉树2.1. 简介2.2. 二叉树的存储2.3. 二叉树的遍历 1. 树的概念高度：从下往上度量，比如我们要度量第 10 层楼的高度、第13 层楼的高度，起点都是地面，树这种数据结构的高度也是一样，从...
计组问题按字寻址按字节寻址其他
2022-04-13 11:04

回答 1 已采纳是的是的一个字16bit也就是2B,你容量本来是1M*2B，那不就是1M个存储单位么按字节寻址应该是2M吧，一字节不是1B么上学期学的计组，应该还没忘记
TCP3次握手回传寻址问题？ tcp/ip 网络协议网络安全
2023-01-26 12:08

回答 2 已采纳到这里你就要学习NAT的相关知识了。 NAT（Network Address Translation），是指网络地址转换，1994年提出的。当在专用网内部的一些主机本来已经分配到了本地IP地址（即仅在
关于单片机的寻址空间问题 arm 单片机
2022-05-12 01:11

回答 2 已采纳问：单片机只负责给出寻址空间吗，实际并不储存数据吗？答：单片机是一个集合体，包含了处理单元、控制单元、存储器等，一般是带存储器的，有存储器（ROM）就可以存储数据。问：内部EEPROM中的内容是存在
数据结构与算法_08_树和二叉树
2022-06-08 20:30

Happy编程的博客二、二叉树 三、二叉树的遍历四、二叉查找树（Binary Search Tree）五、支持重复数据的二叉查找树六、二叉查找树的时间复杂度分析七、二叉树和散列表对比一、什么是树？树”这种数据结构真的很像我们现实生活...
关于计算机组成原理指令寻址的疑问！硬件工程
2023-03-05 19:49

回答 2 已采纳概念上不同：字:在计算机中，一串数码作为一个整体来处理或运算的，称为一个计算机字，简称字。字节:是指一小组相邻的二进制数码。比字小，是构成字的单位。2、所代表的含义不同：计算机内存中，8个“位”构成一
Java哈希算法、二叉树和递归
2020-04-10 15:33

ihs725的博客常用的构造散列函数的方法: 散列函数能使对一个数据序列的访问过程更加迅速有效，通过散列函数，数据元素将被更快地定位：直接寻址法：取关键字或关键字的某个线性函数值为散列地址。即H(key)=key或H(key) = a?key...
数据结构的内存管理和优化
2023-07-21 00:36

禅与计算机程序设计艺术的博客数据结构是计算机中最基础、最重要的分支之一。数据结构定义了数据的存储结构、关系、操作方法等组织形式。它使得数据可以高效地被处理、存储、检索、传输、检索、修改等。常见的数据结构如数组、链表、栈、队列、树...
没有解决我的问题, 去提问

悬赏问题

¥15 多址通信方式的抗噪声性能和系统容量对比
¥15 winform的chart曲线生成时有凸起
¥15 msix packaging tool打包问题
¥15 finalshell节点的搭建代码和那个端口代码教程
¥15 用hfss做微带贴片阵列天线的时候分析设置有问题
¥15 Centos / PETSc / PETGEM
¥15 centos7.9 IPv6端口telnet和端口监控问题
¥20 完全没有学习过GAN，看了CSDN的一篇文章，里面有代码但是完全不知道如何操作
¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
¥20 海浪数据南海地区海况数据，波浪数据