[LUCENE-10302] PriorityQueue: optimize where we collect then iterate by using O(N) heapify - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Looking at ~~LUCENE-8875~~ (LargeNumHitsTopDocsCollector.java ) I got to wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue when we provide a bulk array to initialize the heap/PriorityQueue. It turns out there is: the JDK's PriorityQueue supports this in its constructors, referring to "This classic algorithm due to Floyd (1964) is known to be O(size)" – heapify() method. There's another that may or may not be the same; I didn't look too closely yet. I see a number of uses of Lucene's PriorityQueue that first collects values and only after collecting want to do something with the results (typical / unsurprising). This lends itself to a builder pattern that can look similar to LargeNumHitsTopDocsCollector in terms of first having an array used like a list and then move over to the PriorityQueue if/when it gets full (it may not).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE_PriorityQueue_Builder_with_heapify.patch
03/Mar/22 14:42
4 kB
David Smiley

Issue Links

relates to

LUCENE-8875 Should TopScoreDocCollector Always Populate Sentinel Values?

Closed

Activity

People

Assignee:: Unassigned

Reporter:: David Smiley

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Dec/21 13:23

Updated:: 28/Aug/22 16:32