通过间隔有效索引对象的结构

I'm currently playing with some ideas wrt CRF-ish work and I have an idea that I need help with.

Minimal Problem

I've got a bunch of function objects (think something expensive like neural nets). They are applied onto a linear buffer (think an array of floats or bytes) but at varying intervals. So they look like that (think of Start and End as "apply Object to buf[Start:End]":

| Object | Start | End |
|--------|-------|-----|
| A      | 0     | 4   |
| B      | 4     | 10  |
| C      | 13    | 15  |

Interval Characteristics

There may be some skips (for example, see the start of C vs the end of B)
There will definitely be changes to the intervals, both positive or negative (for example, B may change from [4:10] to [4:12].
When this happens, the object(s) associated with the intervals may have to be reapplied.
If the interval changes overlaps with another interval, both objects will have to be reapplied. For example, if B changes from [4:10] to [3:12], A would have to be applied to the range [0:3] and B would have to be applied to the range [3:12]
Depending on operation, downstream intervals will have to be updated as well, but the objects will not necessarily have to be reapplied. For example, if it were an insertion that changed the interval range for B, then the interval ranges for C will also increment by 2, but will not trigger a reapplication of C.

Program Characteristics

The intervals change a lot (it's a machine learning training loop).
Supported forms of interval updates are: insert, delete, shiftleft, shiftright. The latter two are the same as insert/delete but applied at the ends of the intervals.
Changes to the interval typically comes as a tuple (index, and size) or as a single index.
Application of function is fairly expensive operation and is CPU bound.
However, being that I am using Go, a couple of mutexes + goroutine solves a majority of the problem (there are some finer points but by large swarths it can be ignored).
One epoch can have anywhere from 5-60ish interval-object pairs.
Buffer is linear, but not necessarily contiguous.

Task

The tasks can be summarized as follows:

Query by index: returns the interval and the object associated with the interval
Update interval: must also update downstream if necessary (which is the majority case)
Insertion of new intervals: must also update downstream

What I've Tried

Map with intervals as a key. This was a bad idea because I had to know if a given index that changed was within a interval or not
Linear structure to keep track of Starts. Discovered a bug immediately when I realized there may be skips.
Linear structures with "holes" to keep track of Starts. This turns out to be similar to a rope.
Ropes and Skip lists. Ended up refactoring what I had into the skiprope package that works for strings. More yak shaving. Yay.
Interval/Segment trees. Implementation is a bitch. I also tried a concrete variant of gods/augmentedtree but couldn't actually get the call-backing to work properly to evaluate it.

The Question

Is there any good data structure that I'm missing out on that would make these tasks easier?

Am I missing out on something blindingly obvious?

A friend suggested I look up incremental compilation methods because it's similar. An analogy used would be that Roslyn would parse/reparse fragments of text in a ranged fashion. That would be quite similar to my problem - just replace linear buffer of floats with linear buffer of tokens.

The problem is I couldn't find any solid useful information about how Roslyn does it.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douzhongjiu2263 2017-10-22 16:18
关注
This solution isn't particularly memory-efficient, but if I understand you correctly, it should allow for a relatively simple implementation of the functionality you want.

Keep an array or slice funcs of all your function objects, so that they each have a canonical integer index, and can be looked up by that index.

Keep a slice of ints s that is always the same size as your buffer of floats; it maps a particular index in your buffer to a "function index" in the slice of functions. You can use -1 to represent a number that is not part of any interval.

Keep a slice of (int, int) pairs intervals such that intervals[i] contains the start-end indices for the function stored at funcs[i].

I believe this enables you to implement your desired functionality without too much hassle. For example, to query by index i, look up s[i], then return funcs[s[i]] and intervals[s[i]]. When changes occur to the buffer, change s as well, cross-referencing between s and the intervals slice to figure out if neighboring intervals are affected. I'm happy to explain this part in more detail, but I don't totally understand the requirements for interval updates. (When you do an interval insert, does it correspond to an insert in the underlying buffer? Or are you just changing which buffer elements are associated with which functions? In which case, does an insert cause a deletion at the beginning of the next interval? Most schemes should work, but it changes the procedure.)
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

通过间隔有效索引对象的结构
2017-10-22 11:47

回答 1 已采纳 This solution isn't particularly memory-efficient, but if I understand you correctly, it should al
数据的存储结构不仅有顺序存储结构和链式存储结构，还有索引结构与散列结构数据结构算法链表
2023-02-17 08:27

回答 2 已采纳数据的四种基本存储结构是顺序存储结构和链式存储结构，还有索引结构与散列结构。文字游戏而已，这几种数据结构是基本和主要的存储结构。还有些其他的数据结构，如图存储结构等。望采纳
java新增对象报索引越界IndexOutOfBoundsException java
2022-06-13 12:39

回答 1 已采纳 xml贴出来
pandas 索引去重_pandas index索引对象
2020-12-23 21:34

weixin_40008870的博客 pandas 的两种数据结构 Series 和 DataFrame ...Index 索引对象pandas 的索引对象负责管理轴标签和其他元数据(如轴名称等)。在创建 Series 或 DataFrame 的时候，所用到的任何数组或其他序列的标签都会被转换成一个 ...
c语言数据结构词索引表出了点问题，谁能帮忙看看呀数据结构
2017-11-29 11:55

回答 3 已采纳用代码插入功能啊，指出哪一个函数有问题
一个关于Series结构对象的问题 python
2021-09-23 18:15

回答 1 已采纳 name index.name
无法通过索引获取数组的对象 php
2013-01-17 15:09

回答 2 已采纳 In your comment you said this "array" was decoded from JSON. When you use json_decode, send true
MySQL索引的数据结构以及算法原理
2018-04-19 22:13

Zeus_龙的博客写在前面的话在编程领域有一句人尽皆知的法则“程序 = 数据结构 + 算法”，我个人是不太赞同这句话（因为我觉得程序不仅仅是数据结构加算法），但是在日常的学习和工作中我确认深深感受到数据结构和算法的重要性，...
数据库中表中索引的创建 mysql sql 数据结构
2023-03-12 22:20

回答 1 已采纳可以，主键是唯一的，但是唯一未必是主键。
在MISRA C中有提示避免索引非数组的对象该怎么解决？
2015-01-05 12:51

回答 1 已采纳但是这个是常规用法啊,也没有说为什么不能用
matlab 索引超出数组范围 matlab
2018-09-10 14:11

回答 2 已采纳 vol0是32001*1的cell数组，列数为1，即col = 1，那么，j 从3开始肯定就会超出列长，程序运行到 for j = 3:col 就报错了另外，为什么用 vol0{i}{j} 索引？这个
MySQL索引背后的数据结构及算法原理
2019-09-23 23:11

禅与计算机程序设计艺术的博客本文以MySQL数据库为研究对象，讨论与数据库索引相关的一些话题。特别需要说明的是，MySQL支持诸多存储引擎，而各种存储引擎对索引的支持也各不相同，因此MySQL数据库支持多种索引类型，如BTree索引，哈希索引，全文...
切片的（整数）索引作为结构数组
2018-12-19 15:02

回答 2 已采纳 You can use map and map it from int => interface{}. package main import ( "fmt" ) func m
interval-tree-type-js:用JavaScript实现的间隔树数据结构
2021-04-30 12:52

间隔树可以有效地查询与一个点或另一个间隔相交的所有间隔。间隔树的一种常见用法是按时间间隔对数据进行索引和查询。查询方法实现为，这意味着根据需要延迟计算输出，而不是立即将其立即添加到数组中。请...
Python 数据分析三剑客之 Pandas（二）：Index 索引对象以及各种索引操作
2020-06-13 22:19

IT.BOB的博客文章目录【01x00】Pandas 数据选择【03x01】通过 list 构建 Series 这里是一段防爬虫文本，请读者忽略。本文原创首发于 CSDN，作者 TRHX。博客首页：https://itrhx.blog.csdn.net/ 本文链接：...
没有解决我的问题, 去提问

悬赏问题

¥15 apm2.8飞控罗盘bad health，加速度计校准失败
¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
¥15 谁有desed数据集呀
¥20 手写数字识别运行c仿真时，程序报错错误代码sim211-100
¥15 关于#hadoop#的问题
¥15 (标签-Python|关键词-socket)
¥15 keil里为什么main.c定义的函数在it.c调用不了
¥50 切换TabTip键盘的输入法
¥15 可否在不同线程中调用封装数据库操作的类
¥15 微带串馈天线阵列每个阵元宽度计算