Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
None
-
None
-
None
-
None
-
New
Description
OfflineSorter is likely I/O bound, yet it doesn't really try to relieve I/O. For instance it always writes the length on 2 bytes, which is waseful when used by BKDWriter since all byte[] arrays have exactly the same length. For LatLonPoint, this is a 25% space overhead that we could remove.
Doing lightweight compression on the fly might also help.
As a data point, Ignacio told me that after indexing 60M shapes with LatLonShape (1.65B triangles), the index directory was about 265GB and dropped to 57GB when merging was over.