Details
Description
Today ConjunctionDISI wraps some sub (two-phase) iterators. I would like to improve it by flattening sub iterators when they implement ConjunctionDISI. In practice, this would make "+A +(+B +C)" be executed more like "+A +B +C" (only in terms of matching, scoring would not change).
My motivation for this is that if we don't flatten and are unlucky, we can sometimes hit some worst cases. For instance consider the 3 following postings lists (sorted by increasing cost):
A: 1, 1001, 2001, 3001, ...
C: 0, 2, 4, 6, 8, 10, 12, 14, ...
B: 1, 3, 5, 7, 9, 11, 13, 15, ...
If we run "+A +B +C", then everything works fine, we use A as a lead, and advance B 1000 by 1000 to find the next match (if any).
However if we run "+A +(+B +C)", then we would iterate B and C 2 by 2 over the entire doc ID space when trying to find the first match which occurs on or after A:1.
This is an extreme example which is unlikely to happen in practice, but flattening would also help a bit on some more common cases. For instance imagine that A, B and C have respective costs of 100, 10 and 1000. If you search for "+A +(+B +C)", then we will use the most costly iterator (C) to confirm matches of B (the least costly iterator, used as a lead) while it would have been more efficient to confirm matches of B with A first, since A is less costly than C.
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Here is a patch collapsing all subconjunctions (thanks Adrien Grand for the help with twophase).