Following the advise of jpountz in
LUCENE-8623I have investigated using radix selection when merging segments instead of sorting the data at the beginning. The results are pretty promising when running Lucene geo benchmarks:
|Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff|
edited: table formatting to be a jira table
In 2D the index throughput is more or less equal but for higher dimensions the impact is quite big. In all cases the merging process requires much less disk space, I am attaching plots showing the different behaviour and I am opening a pull request.
- links to