Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6172

Improve the in-order / out-of-order collection decision process

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today the logic is the following:

      • IndexSearcher looks if the weight can score out-of-order
      • Depending on the value it creates the appropriate top docs/field collector

      I think this has several issues:

      • Only IndexSearcher can actually make the decision correctly, and it only works for top docs/field collectors. If you want to make a multi collector in order to have both facets and top docs, then you're clueless about whether you should create a top docs collector that supports out-of-order collection
      • It is quite fragile: you need to make sure that Weight.scoresDocsOutOfOrder and Weight.bulkScorer agree on when they can score out-of-order. Some queries like BooleanQuery duplicate the logic and other queries like FilteredQuery just always return true to avoid complexity. This is inefficient as this means that IndexSearcher will create a collector that supports out-of-order collection while the common case actually scores documents in order (leap frog between the query and the filter).

      Instead I would like to take advantage of the new collection API to make out-of-order scoring an implementation detail of the bulk scorers. My current idea is as follows:

      • remove Weight.scoresDocsOutOfOrder
      • change Collector.getLeafCollector(LeafReaderContext) to Collector.getLeafCollector(LeafReaderContext, boolean canScoreOutOfOrder)

      This new boolean in Collector.getLeafCollector tells the collector that the scorer supports out-of-order scoring. So by returning a leaf collector that supports out-of-order collection, things will be faster.

      The new logic would be the following. First Weights cannot tell whether they support out-of-order scoring or not. However when a weight knows it supports out-of-order scoring, it will pass canScoreOutOfOrder=true when getting the leaf collector. If the returned collector accepts documents out of order, then the weight will return an out-of order scorer. Otherwise, an in-order scorer is returned.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jpountz Adrien Grand
                Reporter:
                jpountz Adrien Grand
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: