Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6585

Make ConjunctionDISI flatten sub ConjunctionDISI instances


    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 5.3, 6.0
    • None
    • None
    • New


      Today ConjunctionDISI wraps some sub (two-phase) iterators. I would like to improve it by flattening sub iterators when they implement ConjunctionDISI. In practice, this would make "+A +(+B +C)" be executed more like "+A +B +C" (only in terms of matching, scoring would not change).

      My motivation for this is that if we don't flatten and are unlucky, we can sometimes hit some worst cases. For instance consider the 3 following postings lists (sorted by increasing cost):

      A: 1, 1001, 2001, 3001, ...
      C: 0, 2, 4, 6, 8, 10, 12, 14, ...
      B: 1, 3, 5, 7, 9, 11, 13, 15, ...

      If we run "+A +B +C", then everything works fine, we use A as a lead, and advance B 1000 by 1000 to find the next match (if any).

      However if we run "+A +(+B +C)", then we would iterate B and C 2 by 2 over the entire doc ID space when trying to find the first match which occurs on or after A:1.

      This is an extreme example which is unlikely to happen in practice, but flattening would also help a bit on some more common cases. For instance imagine that A, B and C have respective costs of 100, 10 and 1000. If you search for "+A +(+B +C)", then we will use the most costly iterator (C) to confirm matches of B (the least costly iterator, used as a lead) while it would have been more efficient to confirm matches of B with A first, since A is less costly than C.


        1. LUCENE-6585.patch
          6 kB
          Ryan Ernst
        2. LUCENE-6585.patch
          5 kB
          Ryan Ernst



            Unassigned Unassigned
            jpountz Adrien Grand
            2 Vote for this issue
            8 Start watching this issue