Trace compaction

Materialize maintains changes to collections in "traces", each of which initially looks like a log of updates to the collection. This is fine for the first few moments of a demo, but with enough churn going on we will have two problems:

Physical compaction: The update batches will not be merged, and each "random access" to a trace will require an amount of work that increases linearly with the number of update batches absorbed into the trace.

This issue can be addressed with the trace handle's distinguish_since method, which unblocks physical merging. We want to take some care with this because operators like join need the ability to start from a arrangement that is not ahead of the times they need (they need to be able to put a bookmark down at the times of their other input).

It is probably the case that we can casually distinguish_since the lower envelope of timestamps we believe we will see in other inputs.
Logical compaction: Independently, the logical times at which the updates occur will be left at their original values, and a full history of changes is preserved. Even with physical merging, this means that a highly dynamic record, e.g. the total for an accumulation query like TPC-H query 01 or 06, will have a full update history and further updates to it will do the work of repeated re-accumulation of the history.

This issue can be addressed with the trace handle's advance_through method, which indicates that users of the handle do not intend to distinguish between logical times not in advance of the argument supplied to advance_through. Differential is then able to consolidate equivalence classes of times and maintain a footprint proportional to the current size of the collection plus a bit of trailing window of edits proportional to the slop of advance_through.

This issue is more complicated, as once we advance a trace handle, we can't go back. Whatever action precipitates this advancement seals off the potential to load up other Kafka sources and join them with the maintained traces at times other than those in advance of whatever we have advanced to. This is possibly something we want the user to explicitly opt into.

The first issue is fundamentally about physical representation, and we can probably do whatever we want here as long as things don't crash, and we don't run out of memory. Ideally I can fix some things in differential so that this is never a thing a human needs to deal with.

The second issue affects the logical output of the computation, and we want to think very seriously about how to stitch this in to the consistency guarantees. As a start, I would propose that we have a command that advances the lower bound of all collections in a (database, timespace, time domain) that is explicitly integrated into the command stream, so that on replay each use of each collection has a well-specified meaning (I hope).

该提问来源于开源项目：MaterializeInc/materialize

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

5条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
weixin_39714383 2020-11-30 08:56
关注
Related to trace compaction, when we "import" a compacted trace, it is likely very important to pick a time at which we intend for it to "start" and to advance timestamps in the trace to that timestamp. For example, if we have a collection that has been evolving for a while and it is now time, a new query using the associated trace should probably think about advancing each of the times in the trace to time, so that the stream of changes only references current times rather than a big splat of historical times.

This is operationally not too hard, but it probably involves consulting. If anyone else ends up wanting to pick it off, check in with me about trace wrappers that do this.

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Linux 内核参数：compaction 相关
2022-09-20 21:13

私房菜的博客本文分析了Linux 6.12内核中内存规整相关的四个sysctl参数：compact_memory、compaction_proactiveness、compact_unevictable_allowed和extfrag_threshold。compact_memory用于手动触发全节点内存规整；compaction_...
StarRocks源码阅读系列（3）compaction 压缩机制
2022-10-24 19:17

lixiaoer666的博客 if (max_compaction_concurrency || max_compaction_concurrency > base_compaction_num_threads + cumulative_compaction_num_threads) { max_compaction_concurrency = base_compaction_num_threads + cumulative...
17 内存规整(memory compaction)
2021-07-21 21:39

byd yes的博客 trace_mm_compaction_try_to_compact_pages(order, gfp_mask, mode); /* Compact each zone in the list */ /*for_each_zone_zonelist_nodemask宏，它会根据分配掩码来确定需要扫描和遍历哪些zone*/ for_each_zone_...
TRACE EVENTS
2020-12-31 09:50

PYPC2020的博客 Trace events code To Enable trace events : SQL> alter session set events '10093 trace name context forever , level 1'; To Disable it : alter session set events ‘10093 trace name context off’; ...
Linux 调试之 TRACE_EVENT（三）
2022-12-21 16:04

小立爱学习的博客 Linux 调试之 TRACE_EVENT知识简介
Rocksdb Compaction源码详解（二）：Compaction 完整实现过程概览
2020-07-26 14:12

z_stand的博客 Rocksdb的compaction流程可以说是比较核心的流程了，它的存在除了保证不同sst 文件之间的key-value之间的有序性，数据的压缩存储，清理过时数据之外，还需要在存储细节上做一些优化来进一步提升LSM tree的读性能...
page compaction代码分析之一
2020-04-23 15:56

Loopers的博客 * Determines how hard direct compaction should try to succeed. * Lower value means higher priority, analogically to reclaim priority. */ enum compact_priority { COMPACT_PRIO_SYNC_F...
Microsoft SQL Server Trace Flags
2018-02-22 20:19

lhdz_bj的博客 Complete list of Microsoft SQL Server trace flags (585 trace flags)REMEMBER: Be extremely careful with trace flags, test in your test environment first. And consult professionals first if you are the ...
高体必考知识点4
2022-08-08 19:38

Trace Compaction是将长指令序列压缩成少数长指令的技术，目的是提前调度，提高执行效率。指令窗口（Instruction Windows）是乱序前瞻处理器中的一组指令集合，允许指令乱序执行，但最终在提交前仍需排序。栅栏...
高体必考知识点2
2022-08-08 17:59

25. **踪迹压缩(Trace Compaction)**：将长路径上的指令压缩成少量长指令，提前调度执行。 26. **指令窗口(Instruction Windows)**：乱序前瞻处理器中，一组可乱序执行的指令集合，由ROB管理，指令提交前处于推测...
高体知识点(太少了)1
2022-08-08 19:06

【轨迹压缩（Trace Compaction）】轨迹压缩是将一系列指令压缩成少数几条长指令，尽早调度执行，以减少执行时间和提高效率。【指令窗口（Instruction Windows）】指令窗口是乱序前瞻处理器中的一组可乱序执行...
高体必考知识点1
2022-08-08 18:07

24. **Trace Compaction**：一种指令压缩技术，通过合并长路径上的指令来减少指令数量，并尽早调度。 25. **指令窗口（Instruction Windows）**：乱序前瞻处理器中的一组指令集，允许指令乱序执行，直到提交前都...
高体必考知识点3
2022-08-08 17:53

24. **Trace Compaction**：这是一种优化技术，将长路径上的指令压缩为少量的长指令，提前调度，以提高执行效率。 25. **指令窗口(Instruction Windows)**：在乱序前瞻处理器中，指令窗口是可乱序执行的一组指令...
Linux内存管理 - 页框分配器2 watermark/compaction/reclaim
2022-04-13 15:51

生活需要深度的博客 1.2 compact_result 本结构用于描述压缩处理函数的返回值： /* Return values for compact_zone() and try_to_compact_pages() */ /* When adding new states, please adjust include/trace/events/compaction.h *...
[内核内存] [arm64] 内存规整1---memory-compaction详解
2021-04-25 20:36

早起的虫儿有鹰吃的博客文章目录1.memory-compaction简介2.memory-compaction调用流程3.memory-compaction源码分析3.1内存规整关键数据结构3.2struct zone中与内存规整相关的成员3.3内存规整扫描zone的基本单位pageblock3.4 fragmentation ...
Go性能分析工具对比：pprof vs trace vs benchstat谁更强大？
2025-10-24 11:11

AlgoFun的博客对比Go性能测试工具pprof、trace与benchstat的核心优势与适用场景，深入解析性能分析方法。涵盖CPU、内存、执行轨迹等关键指标，帮助开发者精准定位瓶颈。三种工具实战对比，提升Go应用性能效率，值得收藏。
基于 Apache Doris 的 PB 级实时场景优化与实践.pdf
2025-08-12 09:12

Doris 2.0版本进一步尝试了Vertical Compaction和Ordered Data Compaction等特性，引入了Brpc线程池拆分和Memtable延迟排序等优化，有效提升了写入速度至稳定的2M row/s。集群CPU的使用率仍然较高，且存在Compaction...
Linux内存管理（80）：直接内存规整详解
2022-09-21 14:03

私房菜的博客在前一篇博文手动规整；kcompactd 内核线程规整；...前两种方式已经与Linux 内核参数：compaction一文和kcompactd详解一文剖析过，本文将继续剖析第三种规整方式——直接内存规整，以及规整处理的核心函数。
Oracle 11g trace events
2014-11-27 11:25

weixin_33794672的博客 oracle的events，是我们在做自己...Oracle 11g trace eventsORA-10001: control file crash event1ORA-10002: control file crash event2ORA-10003: control file crash event3ORA-10004: block recovery testing -...
没有解决我的问题, 去提问

Trace compaction

5条回答 默认 最新

5条回答默认最新