Thrashing while merging

Merging performance is still very slow when the geometry+index exceeds the size of memory because of all the paging.

For 2 billion points, there are about 49GB of geometry and 66GB of index. (Variable size for geometry, but apparently an average of 24 bytes; 32 bytes/record for index). At 8 sec/GB for I/O, that's 15 minutes for just one linear pass.

Linearizing the merge: There were 1252 chunks during the sort. If each pass merged two of them instead of merging from all 1252 at once, it would take 10.3 passes to get the complete file, so 2.5 hours of I/O from 10.3 linear runs through the data. Does this actually come out any better than letting it thrash?

Larger sort chunks would reduce the number of passes. (Do Macs still crash when doing random access writes to a 2GB map?) Use the ratio of the geometry size to the index size to estimate how many geometry records would fit in memory.

The index could maybe be a little smaller by using start+len instead of start…end and by packing start and sequence into smaller integers.

该提问来源于开源项目：mapbox/tippecanoe

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

13条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
weixin_39547392 2020-11-29 14:24
关注
Idea: radix sort. Instead of concatenating the whole index and geometry together before sorting, split it out into 500 (or however many open files we get) files by index prefix, sort/merge each of them, and then concatenate them back together at the end.

In the simplest case, the radices are z4 or z5 tiles. If the data isn't worldwide, we can do better just by keeping track of the highest and lowest index seen. Better yet would be some tracking of the statistical distribution of prefixes, but that would be hard.

Could possibly also split into more radices than there are files by splitting into many offsets into a giant memory-mapped file with holes, if that doesn't have bad performance.

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Linux之伙伴系统分析
2024-01-06 14:52

夜暝的博客 while (start ) { order = min(MAX_ORDER - 1UL, __ffs(start)); while (start + (1UL ) > end) order--; memblock_free_pages(pfn_to_page(start), start, order); start += (1UL ); } } 随后调用到函数__free_...
前端知识宝典
2024-01-29 11:49

微笑么么哒的博客搜索框输入，文本剪辑器实时保存代码实现思路如下：（利用定时器，每次触发先清掉以前的定时器，从新开始）
【OS】新国立nus操作系统知识点（中文版）
2022-11-20 14:43

嘤桃子的博客 `Merging and compaction` Example 动态分配算法：伙伴系统 `Buddy system` 伙伴系统：分配算法 Buddy System: Allocation Algorithm 伙伴系统：释放算法 Deallocation Algorithm 如何找到Buddy 8. 内存管理：不相交...
Python-3.7.0常用数据类型源码—列表
2022-03-18 21:18

今晚务必早点睡的博客 There's a simple test case where somehow this reduces thrashing when a *very* large list is created and immediately deleted. */ i = Py_SIZE(op); while (--i >= 0) { Py_XDECREF(op->ob_item[i]); } PyMem...
LRU链表介绍
2022-05-07 10:21

开心才是真的博客文章目录 1. 简介 2. LRU 组织 2.1 LRU 链表 2.2 LRU Cache 2.3 LRU 移动操作 2.3.1 page 加入 LRU 2.3.2 其他 LRU 移动操作 3. LRU 回收 3.1 LRU 更新 ... 3.4.3 shrink_list
Linux mem 2.5 Buddy 内存回收机制
2021-04-22 19:42

pwl999的博客文章目录1. 简介2. LRU 组织2.1 LRU 链表2.2 LRU Cache2.3 LRU 移动操作2.3.1 page 加入 LRU2.3.2 其他 LRU 移动操作3. LRU 回收3.1 LRU 更新3.2 Swappiness3.3 反向映射3.4 代码实现3.4.1 struct scan_control3.4.2...
FAST 2020 摘要概览
2020-09-16 20:17

gogobody的博客 While enjoying the benefits of decentralization such as high scalability, robustness, and performance, CRUSH-based storage systems suffer from uncontrolled data migration when expanding the clusters,...
Vue(v2.6.11)万行源码生啃，就硬刚！
2020-07-06 17:50

「已注销」的博客玩归玩，闹归闹，Vue 源码要知道。
ES doc_values的来源，field data——就是doc->terms的正向索引啊，不过它是在查询阶段通过读取倒排索引loading segments放在内存而得到的？...
2019-09-27 11:36

djph26741的博客 This causes memory thrashing and slow garbage collections, and your users suffer from very slow queries while they wait for their fielddata to be loaded. Simply put, once fielddata becomes a ...
Vue vue生命周期
2019-01-07 22:14

mqingo的博客 [vue实例的生命周期]( ... #实例生命周期) + 什么是生命周期：从Vue实例创建、运行、到销毁期间，总是伴随着各种各样的事件，这些事件，统称... while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix...
代码覆盖率工具大全
2019-01-21 17:10

flyingleo1981的博客 While TCAT's C1 metric determines how many logical branches in a module are exercised, TCAT-PATH's Ct metric determines how many of the possible path combinations in a module are exercised. ...
vue.js源码
2019-03-13 16:24

古灬风的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _...
Vue 16v-if和v-show的特点
2019-01-02 16:14

mqingo的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _from[key]; } return to }...
Vue 12v-for循环对象数组
2019-01-02 11:30

mqingo的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _from[key]; } return to }...
Vue v-model指令的学习
2019-01-02 09:43

mqingo的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _from[key]; } return to }...
Vue 08使用v-model实现计算器
2019-01-02 10:16

mqingo的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _from[key]; } return to }...
Vue 09Vue中使用行内样式---class
2019-01-02 10:59

mqingo的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _from[key]; } return to }...
Vue 11v-for循环普通数组
2019-01-02 11:20

mqingo的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _from[key]; } return to }...
Vue 13v-for循环对象.html
2019-01-02 11:41

mqingo的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _from[key]; } return to }...
Vue 10Vue中使用内联样式----style
2019-01-02 11:08

mqingo的博客 while (i--) { ret[i] = list[i + start]; } return ret } /** * Mix properties into target object. */ function extend (to, _from) { for (var key in _from) { to[key] = _from[key]; } return to }...
没有解决我的问题, 去提问

Thrashing while merging

13条回答 默认 最新

13条回答默认最新