Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7628

Add a getMatchingChildren() method to DisjunctionScorer

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 6.5
    • None
    • None
    • New

    Description

      This one is a bit convoluted, so bear with me...

      The luwak highlighter works by rewriting queries into their Span-equivalents, and then running them with a special Collector. At each matching doc, the highlighter gathers all the Spans objects positioned on the current doc and collects their positions using the SpanCollection API.

      Some queries can't be translated into Spans. For those queries that generate Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on the Scorer and see if any of them are SpanScorers, and for those that aren't we can call .getChildren() again and recurse down. For each child scorer, we check that it's positioned on the current document, so non-matching subscorers can be skipped.

      This all works correctly except in the case of a DisjunctionScorer where one of the children is a two-phase iterator that has matched its approximation, but not its refinement query. A SpanScorer in this situation will be correctly positioned on the current document, but its Spans will be in an undefined state, meaning the highlighter will either collect incorrect hits, or it will throw an Exception and prevent hits being collected from other subspans.

      We've tried various ways around this (including forking SpanNearQuery and adding a bunch of slow position checks to it that are used only by the highlighting code), but it turns out that the simplest fix is to add a new method to DisjunctionScorer that only returns the currently matching child Scorers. It's a bit of a hack, and it won't be used anywhere else, but it's a fairly small and contained hack.

      Attachments

        1. LUCENE-7628.patch
          9 kB
          Alan Woodward
        2. LUCENE-7628.patch
          4 kB
          Alan Woodward

        Issue Links

          Activity

            People

              romseygeek Alan Woodward
              romseygeek Alan Woodward
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: