Lucene - Core
  1. Lucene - Core
  2. LUCENE-5796

Scorer.getChildren() can throw or hide a subscorer for some boolean queries

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.9
    • Fix Version/s: 4.10, 6.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I've isolated two example boolean queries that don't behave with release 4.9 of Lucene.

      1. A BooleanQuery with three SHOULD clauses and a minimumNumberShouldMatch of 2 will throw an ArrayIndexOutOfBoundsException.
        java.lang.ArrayIndexOutOfBoundsException: 2
        	at __randomizedtesting.SeedInfo.seed([2F79B3DF917D071B:2539E6DBC4DF793C]:0)
        	at org.apache.lucene.search.MinShouldMatchSumScorer.getChildren(MinShouldMatchSumScorer.java:119)
        	at org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.summarizeScorer(TestBooleanQueryVisitSubscorers.java:261)
        	at org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.setScorer(TestBooleanQueryVisitSubscorers.java:238)
        	at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:161)
        	at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:64)
        	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
        	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94)
        	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
        	at org.apache.lucene.search.TestBooleanQueryVisitSubscorers.testGetChildrenMinShouldMatchSumScorer(TestBooleanQueryVisitSubscorers.java:196)
        
      2. A BooleanQuery with two should clauses, one of which is a miss for all documents in the current segment will accidentally mask the scorer that was a hit.

      Unit tests and patch based on branch_4x are available and will be attached as soon as this ticket has a number.

      They are immediately available on GitHub on branch shebiki/bqgetchildren as commit c64bb6f.

      I took the liberty of naming the relationship in BoostingScorer.getChildren() BOOSTING. Suspect someone will offer a better name for this. Here is a summary of the various relationships in play for all Scorer.getChildren() implementations on branch_4x to help choose.

      class relationships
      org.apache.lucene.search.AssertingScorer SHOULD
      org.apache.lucene.search.join.ToParentBlockJoinQuery.BlockJoinScorer BLOCK_JOIN
      org.apache.lucene.search.ConjunctionScorer MUST
      org.apache.lucene.search.ConstantScoreQuery.ConstantScorer constant
      org.apache.lucene.queries.function.BoostedQuery.CustomScorer CUSTOM
      org.apache.lucene.queries.CustomScoreQuery.CustomScorer CUSTOM
      org.apache.lucene.search.DisjunctionScorer SHOULD
      org.apache.lucene.facet.DrillSidewaysScorer.FakeScorer MUST
      org.apache.lucene.search.FilterScorer calls in.getChildren()
      org.apache.lucene.search.ScoreCachingWrappingScorer CACHED
      org.apache.lucene.search.FilteredQuery.LeapFrogScorer FILTERED
      org.apache.lucene.search.MinShouldMatchSumScorer SHOULD
      org.apache.lucene.search.FilteredQuery FILTERED
      org.apache.lucene.search.ReqExclScorer MUST
      org.apache.lucene.search.ReqOptSumScorer MUST, SHOULD
      org.apache.lucene.search.join.ToChildBlockJoinQuery BLOCK_JOIN

      I also removed FilterScorer.getChildren() to prevent mistakes and force subclasses to provide a correct implementation.

        Activity

        Hide
        Robert Muir added a comment -

        I don't think we should remove the default implementation for FilterScorer, as the scorer is not really changed when using this abstract class, its just wrapped? For the same reason, i think the boostingscorer (since its just an implementation detail of how the current BS2 stuff solves this case) should be transparent.

        Show
        Robert Muir added a comment - I don't think we should remove the default implementation for FilterScorer, as the scorer is not really changed when using this abstract class, its just wrapped? For the same reason, i think the boostingscorer (since its just an implementation detail of how the current BS2 stuff solves this case) should be transparent.
        Hide
        Terry Smith added a comment -

        Thanks for taking the time to review my patch and comment on the approach.

        The reason that I advocated changing FilterScorer and BoostedScorer is to allow some of my custom Query implementations to use a regular BooleanQuery for recall and optionally scoring while taking advantage of the actual Scorers used on a per document, per clause basis.

        This has been working great across quite a few Lucene releases but failed when I upgraded to 4.9 due to the two regressions in behavior for Scorer.getChildren() as described in this ticket.

        In this scenario, a BooleanQuery containing two TermQueries (one a miss and the other a hit) returns the following from BooleanWeight.scorer():

        • BoostedScorer
          • TermScorer (hit)

        Calling getChildren() on this returns an empty list because the BoostedScorer just returns in.getChildren() and thus you are unable to navigate to the actual TermScorer in play. This would impact any classes that extend FilterScorer and don't override getChildren(). In other words, the current wiring does make the BoostedScorer transparent but with the disadvantage of hiding the actual scorer that performs the work.

        If this is an unsupported workflow, I'm happy to move the discussion over to the user mailing list.

        Show
        Terry Smith added a comment - Thanks for taking the time to review my patch and comment on the approach. The reason that I advocated changing FilterScorer and BoostedScorer is to allow some of my custom Query implementations to use a regular BooleanQuery for recall and optionally scoring while taking advantage of the actual Scorers used on a per document, per clause basis. This has been working great across quite a few Lucene releases but failed when I upgraded to 4.9 due to the two regressions in behavior for Scorer.getChildren() as described in this ticket. In this scenario, a BooleanQuery containing two TermQueries (one a miss and the other a hit) returns the following from BooleanWeight.scorer(): BoostedScorer TermScorer (hit) Calling getChildren() on this returns an empty list because the BoostedScorer just returns in.getChildren() and thus you are unable to navigate to the actual TermScorer in play. This would impact any classes that extend FilterScorer and don't override getChildren(). In other words, the current wiring does make the BoostedScorer transparent but with the disadvantage of hiding the actual scorer that performs the work. If this is an unsupported workflow, I'm happy to move the discussion over to the user mailing list.
        Hide
        Robert Muir added a comment -

        I see: this makes sense. If you have a custom scorer you may need access to the raw one, so this makes sense to remove the transparency... I'll look at the patch again and reply back if I have more questions.

        Show
        Robert Muir added a comment - I see: this makes sense. If you have a custom scorer you may need access to the raw one, so this makes sense to remove the transparency... I'll look at the patch again and reply back if I have more questions.
        Hide
        ASF subversion and git services added a comment -

        Commit 1608454 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1608454 ]

        LUCENE-5796: Fix Scorer getChildren for two combinations of BooleanQuery

        Show
        ASF subversion and git services added a comment - Commit 1608454 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1608454 ] LUCENE-5796 : Fix Scorer getChildren for two combinations of BooleanQuery
        Hide
        ASF subversion and git services added a comment -

        Commit 1608457 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1608457 ]

        LUCENE-5796: Fix Scorer getChildren for two combinations of BooleanQuery

        Show
        ASF subversion and git services added a comment - Commit 1608457 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1608457 ] LUCENE-5796 : Fix Scorer getChildren for two combinations of BooleanQuery
        Hide
        Robert Muir added a comment -

        Thanks Terry!

        Show
        Robert Muir added a comment - Thanks Terry!

          People

          • Assignee:
            Unassigned
            Reporter:
            Terry Smith
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development