Lucene - Core
  1. Lucene - Core
  2. LUCENE-1134

BooleanQuery.rewrite does not work properly for minNumberShouldMatch

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      BooleanQuery.rewrite does not respect minNumberShouldMatch if the number of clauses is 1. This causes inconsistencies for the queries "+def" and "+abc +def", while setting the minNumShouldMatch to '1' for both.
      For the first query, results are returned although there are no SHOULD clauses in the query.
      For the second query no results are returned.
      The reason lies in the optimization BooleanQuery.rewrite has for one clauses queries.
      Patch included - optimize the query for a single clause only if the minNumShouldMatch <= 0.

        Activity

        Hide
        Shai Erera added a comment -

        The fix is very trivial. Which tests should I run to ensure this doesn't break anything?

        Show
        Shai Erera added a comment - The fix is very trivial. Which tests should I run to ensure this doesn't break anything?
        Hide
        Paul Elschot added a comment -

        I'd start with these tests:

        ant -Dtestcase='Test*Bool*' test-core

        You might also consider adding a test case in one of those tests that shows the bug without the patch applied.

        Show
        Paul Elschot added a comment - I'd start with these tests: ant -Dtestcase='Test*Bool*' test-core You might also consider adding a test case in one of those tests that shows the bug without the patch applied.
        Hide
        Hoss Man added a comment -

        At first glance, the patch seems right .. but an optimization can probably be made ... if there is only once clause, and minNrShouldMatch is greater then 1, the query can rewrite itself to a "no-op" type query that doesn't match any docs.

        Actually: BooleanQuery.rewrite can do something like that anytime minNrShouldMath is greater then the number of optional clauses.

        Show
        Hoss Man added a comment - At first glance, the patch seems right .. but an optimization can probably be made ... if there is only once clause, and minNrShouldMatch is greater then 1, the query can rewrite itself to a "no-op" type query that doesn't match any docs. Actually: BooleanQuery.rewrite can do something like that anytime minNrShouldMath is greater then the number of optional clauses.
        Hide
        Paul Elschot added a comment -

        There is no real need for this optimization because BooleanScorer2 already uses a NonMatchingScorer for this case.

        Show
        Paul Elschot added a comment - There is no real need for this optimization because BooleanScorer2 already uses a NonMatchingScorer for this case.
        Hide
        Hoss Man added a comment -

        Ah... good call.

        (Hmmm.... it stil might be worth doing just to make it clear to clients earlier rather then later that they won't match anything. ie:If you inspect a rewritten FuzzyQuery you get to see right away how many clauses it has ... why not let people inspecting a rewritten BooleanQuery in cases like this see right away that their query is a No-Op ?)

        Show
        Hoss Man added a comment - Ah... good call. (Hmmm.... it stil might be worth doing just to make it clear to clients earlier rather then later that they won't match anything. ie:If you inspect a rewritten FuzzyQuery you get to see right away how many clauses it has ... why not let people inspecting a rewritten BooleanQuery in cases like this see right away that their query is a No-Op ?)
        Hide
        Paul Elschot added a comment -

        A FuzzyQuery rewrites to a BooleanQuery with SHOULD clauses only, so a check and cast to BooleanQuery and then a clauses().size() == 0 can check for a No-Op after a rewrite().

        Show
        Paul Elschot added a comment - A FuzzyQuery rewrites to a BooleanQuery with SHOULD clauses only, so a check and cast to BooleanQuery and then a clauses().size() == 0 can check for a No-Op after a rewrite().
        Hide
        Hoss Man added a comment -

        I think you're misunderstanding me Paul: i was just trying to make an analogy – even if the Scorer optimizes away a BooleanQuery that can never match anything into a NonMatchingScorer, there are other advantages to having rewrite detect these situations (when it can) ... inspection of the rewritten queries being an example.

        Show
        Hoss Man added a comment - I think you're misunderstanding me Paul: i was just trying to make an analogy – even if the Scorer optimizes away a BooleanQuery that can never match anything into a NonMatchingScorer, there are other advantages to having rewrite detect these situations (when it can) ... inspection of the rewritten queries being an example.
        Hide
        Shai Erera added a comment -

        Hi

        W.r.t. testing, I found TestBooleanMinShouldMatch. It had testNoOptionalButMin which added two MUST clauses. I added testNoOptionalButMin2 that adds only a single clause. It fails without the patch and passes with the patch. Other than that the tests in this class pass.
        I updated the patch to include the additional test.

        W.r.t. the optimization that was discussed: I don't understand how it can be done. BooleanQuery has a clauses list. It is not aware of any SHOULD, MUST or MUST_NOT clauses in its rewrite() method. So how exactly this optimization can be done? The only class that seems to be aware of the type of clauses is BooleanScorer2 in its add() method.

        Show
        Shai Erera added a comment - Hi W.r.t. testing, I found TestBooleanMinShouldMatch. It had testNoOptionalButMin which added two MUST clauses. I added testNoOptionalButMin2 that adds only a single clause. It fails without the patch and passes with the patch. Other than that the tests in this class pass. I updated the patch to include the additional test. W.r.t. the optimization that was discussed: I don't understand how it can be done. BooleanQuery has a clauses list. It is not aware of any SHOULD, MUST or MUST_NOT clauses in its rewrite() method. So how exactly this optimization can be done? The only class that seems to be aware of the type of clauses is BooleanScorer2 in its add() method.
        Hide
        Shai Erera added a comment -

        Fix + Test

        Show
        Shai Erera added a comment - Fix + Test
        Hide
        Michael Busch added a comment -

        Committed. Thanks, Shai!

        Show
        Michael Busch added a comment - Committed. Thanks, Shai!

          People

          • Assignee:
            Unassigned
            Reporter:
            Shai Erera
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development