Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5418

Don't use .advance on costly (e.g. distance range facets) filters

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.7, 6.0
    • modules/facet
    • None
    • New

    Description

      If you use a distance filter today (see http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html ), then drill down on one of those ranges, under the hood Lucene is using .advance on the Filter, which is very costly because we end up computing distance on (possibly many) hits that don't match the query.

      It's better performance to find the hits matching the Query first, and then check the filter.

      FilteredQuery can already do this today, when you use its QUERY_FIRST_FILTER_STRATEGY. This essentially accomplishes the same thing as Solr's "post filters" (I think?) but with a far simpler/better/less code approach.

      E.g., I believe ElasticSearch uses this API when it applies costly filters.

      Longish term, I think Query/Filter ought to know itself that it's expensive, and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's passed to IndexSearcher.search, we should also be "smart" here and not call .advance on such clauses. But that'd be a biggish change ... so for today the "workaround" is the user must carefully construct the FilteredQuery themselves.

      In the mean time, as another workaround, I want to fix DrillSideways so that when you drill down on such filters it doesn't use .advance; this should give a good speedup for the "normal path" API usage with a costly filter.

      I'm iterating on the lucene server branch (LUCENE-5376) but once it's working I plan to merge this back to trunk / 4.7.

      Attachments

        1. LUCENE-5418.patch
          98 kB
          Michael McCandless
        2. LUCENE-5418.patch
          74 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: