Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
TopScoreDocCollector always initializes HitQueue as the PQ implementation, and instruct HitQueue to populate with sentinels. While this is a great safety mechanism, for very large datasets where the query's selectivity is high, the sentinel population can be redundant and can become a large enough bottleneck in itself. Does it make sense to introduce a new parameter in TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and does not populate sentinels?
Attachments
Issue Links
- is related to
-
LUCENE-10302 PriorityQueue: optimize where we collect then iterate by using O(N) heapify
- Open
- links to