[LUCENE-6585] Make ConjunctionDISI flatten sub ConjunctionDISI instances - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.3, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Today ConjunctionDISI wraps some sub (two-phase) iterators. I would like to improve it by flattening sub iterators when they implement ConjunctionDISI. In practice, this would make "+A +(+B +C)" be executed more like "+A +B +C" (only in terms of matching, scoring would not change).

My motivation for this is that if we don't flatten and are unlucky, we can sometimes hit some worst cases. For instance consider the 3 following postings lists (sorted by increasing cost):

A: 1, 1001, 2001, 3001, ...
C: 0, 2, 4, 6, 8, 10, 12, 14, ...
B: 1, 3, 5, 7, 9, 11, 13, 15, ...

If we run "+A +B +C", then everything works fine, we use A as a lead, and advance B 1000 by 1000 to find the next match (if any).

However if we run "+A +(+B +C)", then we would iterate B and C 2 by 2 over the entire doc ID space when trying to find the first match which occurs on or after A:1.

This is an extreme example which is unlikely to happen in practice, but flattening would also help a bit on some more common cases. For instance imagine that A, B and C have respective costs of 100, 10 and 1000. If you search for "+A +(+B +C)", then we will use the most costly iterator (C) to confirm matches of B (the least costly iterator, used as a lead) while it would have been more efficient to confirm matches of B with A first, since A is less costly than C.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6585.patch
25/Jun/15 02:29
6 kB
Ryan Ernst
LUCENE-6585.patch
19/Jun/15 15:04
5 kB
Ryan Ernst

Activity

People

Assignee:: Unassigned

Reporter:: Adrien Grand

Votes:: 2 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 18/Jun/15 23:42

Updated:: 28/Aug/22 14:36

Resolved:: 25/Jun/15 03:14