Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6890

Specialize 1D dimensional values intersection

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      I tried implementing the same specialization we had before LUCENE-6881 for the 1D case, but after testing it, I don't think it's worth it.

      I'll upload the patch here for posterity (tests pass), but net/net it adds non-trivial code complexity in exchange for minor (5.39 sec -> 5.25 sec for 225 queries) query gains. Maybe in the future someone could improve this so it's more compelling... but I don't think the tradeoff is worth it today.

      Furthermore, the optimization 1) requires an API change, and 2) is not even admissible in the current patch, since the query could be a union of multiple disjoint ranges when the optimization assumes it's just a single range.

      The gist of the idea is to locate the start leaf block and end leaf block, make an informed estimate of the expected result set size, and then do a linear scan of the leaf blocks, vs the recursion and "grow per leaf block" we do today. I think the conclusion is that this used to be more sizable win, but DocIdSetBuilder has improved so that it is plenty fast without "upfront" growing, which is nice

      Or maybe my benchmark is bogus

      I'll commit the minor code comment / TODOs / test improvements from the patch ...

      Attachments

        1. LUCENE-6890.patch
          13 kB
          Michael McCandless

        Activity

          People

            Unassigned Unassigned
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: