[LUCENE-8829] TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

While investigating LUCENE-8819, I understood that TopDocs#merge's order of results are indirectly dependent on the number of collectors involved in the merge. This is troubling because 1) The number of collectors involved in a merge are cost based and directly dependent on the number of slices created for the parallel searcher case. 2) TopN hits code path will invoke merge with a single Collector, so essentially, doing the same TopN query with single threaded and parallel threaded searcher will invoke different order of results, which is a bad invariant that breaks.

The reason why this happens is because of the subtle way TopDocs#merge sets shardIndex in the ScoreDoc population during populating the priority queue used for merging. ShardIndex is essentially set to the ordinal of the collector which generates the hit. This means that the shardIndex is dependent on the number of collectors, even for the same set of hits.

In case of no sort order specified, shardIndex is used for tie breaking when scores are equal. This translates to different orders for same hits with different shardIndices.

I propose that we remove shardIndex from the default tie breaking mechanism and replace it with docID. DocID order is the de facto that is expected during collection, so it might make sense to use the same factor during tie breaking when scores are the same.

CC: ivera

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-8829.patch
11/Jun/19 10:59
13 kB
Atri Sharma
LUCENE-8829.patch
11/Jun/19 07:28
12 kB
Atri Sharma
LUCENE-8829.patch
06/Jun/19 10:14
11 kB
Atri Sharma
LUCENE-8829.patch
05/Jun/19 08:11
5 kB
Atri Sharma

Issue Links

relates to

LUCENE-8819 org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

Open

LUCENE-8757 Better Segment To Thread Mapping Algorithm

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Atri Sharma

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 04/Jun/19 18:00

Updated:: 28/Aug/22 15:46

Resolved:: 03/Jul/19 14:45