Merging performance is still very slow when the geometry+index exceeds the size of memory because of all the paging.
For 2 billion points, there are about 49GB of geometry and 66GB of index. (Variable size for geometry, but apparently an average of 24 bytes; 32 bytes/record for index). At 8 sec/GB for I/O, that's 15 minutes for just one linear pass.
Linearizing the merge: There were 1252 chunks during the sort. If each pass merged two of them instead of merging from all 1252 at once, it would take 10.3 passes to get the complete file, so 2.5 hours of I/O from 10.3 linear runs through the data. Does this actually come out any better than letting it thrash?
Larger sort chunks would reduce the number of passes. (Do Macs still crash when doing random access writes to a 2GB map?) Use the ratio of the geometry size to the index size to estimate how many geometry records would fit in memory.
The index could maybe be a little smaller by using start+len instead of start…end and by packing start and sequence into smaller integers.
该提问来源于开源项目:mapbox/tippecanoe