[LUCENE-8673] Use radix partitioning when merging dimensional points - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 8.x, 9.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Following the advise of jpountz in ~~LUCENE-8623~~I have investigated using radix selection when merging segments instead of sorting the data at the beginning. The results are pretty promising when running Lucene geo benchmarks:

Approach	Index time (sec): Dev	Index Time (sec): Base	Index Time: Diff	Force merge time (sec): Dev	Force Merge time (sec): Base	Force Merge Time: Diff	Index size (GB): Dev	Index size (GB): Base	Index Size: Diff	Reader heap (MB): Dev	Reader heap (MB): Base	Reader heap: Diff
points	241.5s	235.0s	3%	157.2s	157.9s	-0%	0.55	0.55	0%	1.57	1.57	0%
shapes	416.1s	650.1s	-36%	306.1s	603.2s	-49%	1.29	1.29	0%	1.61	1.61	0%
geo3d	261.0s	360.1s	-28%	170.2s	279.9s	-39%	0.75	0.75	0%	1.58	1.58	0%

edited: table formatting to be a jira table

In 2D the index throughput is more or less equal but for higher dimensions the impact is quite big. In all cases the merging process requires much less disk space, I am attaching plots showing the different behaviour and I am opening a pull request.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LatLonShape.png
31/Jan/19 12:31
8 kB
Ignacio Vera
LatLonShape.png
03/Feb/19 13:08
8 kB
Ignacio Vera
LatLonShape.png
05/Feb/19 07:12
8 kB
Ignacio Vera
LatLonPoint.png
31/Jan/19 12:31
7 kB
Ignacio Vera
LatLonPoint.png
03/Feb/19 13:08
7 kB
Ignacio Vera
LatLonPoint.png
05/Feb/19 07:12
7 kB
Ignacio Vera
Geo3D.png
31/Jan/19 12:31
8 kB
Ignacio Vera
Geo3D.png
03/Feb/19 13:08
8 kB
Ignacio Vera
Geo3D.png
05/Feb/19 07:12
8 kB
Ignacio Vera

Issue Links

links to

GitHub Pull Request #556

Activity

People

Assignee:: Ignacio Vera

Reporter:: Ignacio Vera

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 31/Jan/19 12:31

Updated:: 28/Aug/22 15:41

Resolved:: 07/Feb/19 07:20

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

5h 40m