Lucene - Core
  1. Lucene - Core
  2. LUCENE-3451

Remove special handling of pure negative Filters in BooleanFilter, disallow pure negative queries in BooleanQuery

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      We should at least in Lucene 4.0 remove the hack in BooleanFilter that allows pure negative Filter clauses. This is not supported by BooleanQuery and confuses users (I think that's the problem in LUCENE-3450).

      The hack is buggy, as it does not respect deleted documents and returns them in its DocIdSet.

      Also we should think about disallowing pure-negative Queries at all and throw UOE.

      1. LUCENE-3451.patch
        13 kB
        Uwe Schindler
      2. LUCENE-3451.patch
        13 kB
        Uwe Schindler
      3. LUCENE-3451.patch
        14 kB
        Uwe Schindler
      4. LUCENE-3451.patch
        6 kB
        Uwe Schindler
      5. LUCENE-3451.patch
        3 kB
        Uwe Schindler

        Issue Links

          Activity

          Uwe Schindler created issue -
          Uwe Schindler made changes -
          Field Original Value New Value
          Link This issue supercedes LUCENE-3450 [ LUCENE-3450 ]
          Hide
          Uwe Schindler added a comment -

          Simple patch for trunk:

          • Disables the default full-1 bitset if no required/optional clauses
          • Moves the Prohibited clauses at the end. The order is now: SHOULD, MUST, MUST_NOT; I am not sure if this is correct and conforms to BooleanScorer2 (I don't understand BooleanScorer2). Does somebody know in which order clauses are applied in BooleanScorer2?
          Show
          Uwe Schindler added a comment - Simple patch for trunk: Disables the default full-1 bitset if no required/optional clauses Moves the Prohibited clauses at the end. The order is now: SHOULD, MUST, MUST_NOT; I am not sure if this is correct and conforms to BooleanScorer2 (I don't understand BooleanScorer2). Does somebody know in which order clauses are applied in BooleanScorer2?
          Uwe Schindler made changes -
          Attachment LUCENE-3451.patch [ 12496271 ]
          Hide
          Michael McCandless added a comment -

          Can we throw an exc if a BQ or BF has only MUST_NOT clauses, since we cannot handle it?

          I think silently doing the wrong thing is awful (eg led to Karl struggling to understand what was wrong).

          Show
          Michael McCandless added a comment - Can we throw an exc if a BQ or BF has only MUST_NOT clauses, since we cannot handle it? I think silently doing the wrong thing is awful (eg led to Karl struggling to understand what was wrong).
          Hide
          Chris Male added a comment -

          Can we throw an exc if a BQ or BF has only MUST_NOT clauses, since we cannot handle it?

          +1. Hit this problem myself many times when first using Solr and Lucene.

          Show
          Chris Male added a comment - Can we throw an exc if a BQ or BF has only MUST_NOT clauses, since we cannot handle it? +1. Hit this problem myself many times when first using Solr and Lucene.
          Uwe Schindler made changes -
          Summary Remove special handling of pure negative Filters in BooleanFilter (inconsistentToBQ+broken) Remove special handling of pure negative Filters in BooleanFilter, disallow pure negative queries in BooleanQuery
          Lucene Fields [New] [New, Patch Available]
          Description We should at least in Lucene 4.0 remove the hack in BooleanFilter that allows pure negative Filter clauses. This is not supported by BooleanQuery and confuses users (I think that's the problem in LUCENE-3450).

          The hack is buggy, as it does not respect deleted documents and returns them in its DocIdSet.
          We should at least in Lucene 4.0 remove the hack in BooleanFilter that allows pure negative Filter clauses. This is not supported by BooleanQuery and confuses users (I think that's the problem in LUCENE-3450).

          The hack is buggy, as it does not respect deleted documents and returns them in its DocIdSet.

          Also we should think about disallowing pure-negative Queries at all and throw UOE.
          Hide
          Uwe Schindler added a comment -

          Patch that also disallows only negative clauses in BooleanQuery and BooleanFilter.

          • For BF this check is done in getDocIdSet that throws UOE
          • For BQ this check is done in ctor of BooleanWeight where the clauses are enumerated and counted, so the check is just an additional low-cost check.

          Currently 2 core tests with DisjunctionMaxQuery fail (have to look into it, maybe they simply test the broken behaviour). Solr tests pass without problems (as Solr already has a special handling in QueryUtils).

          Show
          Uwe Schindler added a comment - Patch that also disallows only negative clauses in BooleanQuery and BooleanFilter. For BF this check is done in getDocIdSet that throws UOE For BQ this check is done in ctor of BooleanWeight where the clauses are enumerated and counted, so the check is just an additional low-cost check. Currently 2 core tests with DisjunctionMaxQuery fail (have to look into it, maybe they simply test the broken behaviour). Solr tests pass without problems (as Solr already has a special handling in QueryUtils).
          Uwe Schindler made changes -
          Attachment LUCENE-3451.patch [ 12496290 ]
          Hide
          Michael McCandless added a comment -

          Patch looks great!

          I think we should also backport to 3.x? And advertise the breakage. If users hit this exception it means their current searches aren't working so it's a service for us to inform them of this.

          Show
          Michael McCandless added a comment - Patch looks great! I think we should also backport to 3.x? And advertise the breakage. If users hit this exception it means their current searches aren't working so it's a service for us to inform them of this.
          Hide
          Uwe Schindler added a comment -

          Attached a patch with some changes in the BooleanQuery prohibited-only detection and all failing tests fixed:

          • TestBoolean2 had a explicit prohibited-only query -> changed to assert the UOE
          • TestBoolean2 had a random BQ generator. Fixed this generator to not create prohibited-only queries
          • in TestComplexExplanations* the prohibited-only fake clauses were removed
          • MemoryIndexTest used a list of lots of query-strings, some of them looking like that : "a -b". If the test had randomly choosen a MockTokenizer with english stopwords -> peng. Removed that Analyzer

          The last example is one problem of the explicit UOE: Now a user can suddenly get UOE if he uses a query where e.g. all positive clauses are stopwords - ok, users were always confused about that. Any comments about that?

          Show
          Uwe Schindler added a comment - Attached a patch with some changes in the BooleanQuery prohibited-only detection and all failing tests fixed: TestBoolean2 had a explicit prohibited-only query -> changed to assert the UOE TestBoolean2 had a random BQ generator. Fixed this generator to not create prohibited-only queries in TestComplexExplanations* the prohibited-only fake clauses were removed MemoryIndexTest used a list of lots of query-strings, some of them looking like that : "a -b". If the test had randomly choosen a MockTokenizer with english stopwords -> peng. Removed that Analyzer The last example is one problem of the explicit UOE: Now a user can suddenly get UOE if he uses a query where e.g. all positive clauses are stopwords - ok, users were always confused about that. Any comments about that?
          Uwe Schindler made changes -
          Attachment LUCENE-3451.patch [ 12496334 ]
          Hide
          Uwe Schindler added a comment -

          There is a debugging relict in last patch: BooleanQuery.this.toString() in BooleanWeights UOE, please remove that.

          Show
          Uwe Schindler added a comment - There is a debugging relict in last patch: BooleanQuery.this.toString() in BooleanWeights UOE, please remove that.
          Hide
          Michael McCandless added a comment -

          Patch looks great Uwe!

          Nice catch on the analyzers removing stop words and then making an all MUST_NOT BQ. But, I think we should throw an exception in this case, since it's a horrible trap now? User will get 0 results but that's flat out silently wrong?

          Show
          Michael McCandless added a comment - Patch looks great Uwe! Nice catch on the analyzers removing stop words and then making an all MUST_NOT BQ. But, I think we should throw an exception in this case, since it's a horrible trap now? User will get 0 results but that's flat out silently wrong?
          Hide
          Uwe Schindler added a comment -

          Updated patch after committing BF cleanup (LUCENE-3458).

          Show
          Uwe Schindler added a comment - Updated patch after committing BF cleanup ( LUCENE-3458 ).
          Uwe Schindler made changes -
          Attachment LUCENE-3451.patch [ 12496392 ]
          Hide
          Mike Sokolov added a comment - - edited

          Just wondering if there is a reason not to "fix" the user's query by adding a

          *:*

          (maybe implicitly), rather than throwing an Exception? This is invariably the fix users are instructed to apply in this case, and it does seem to be the logical implication of a pure not-query.

          Hmm - I just found the discussion over in LUCENE-3460, which addresses this point.

          Show
          Mike Sokolov added a comment - - edited Just wondering if there is a reason not to "fix" the user's query by adding a *:* (maybe implicitly), rather than throwing an Exception? This is invariably the fix users are instructed to apply in this case, and it does seem to be the logical implication of a pure not-query. Hmm - I just found the discussion over in LUCENE-3460 , which addresses this point.
          Hide
          Uwe Schindler added a comment -

          Mike,
          See Sub-Task LUCENE-3460 for an explanation.

          Show
          Uwe Schindler added a comment - Mike, See Sub-Task LUCENE-3460 for an explanation.
          Hide
          Uwe Schindler added a comment -

          How should we proceed with this? I would like to commit this, but I am afraid of the consequences for users without a solution in QueryParser.

          Show
          Uwe Schindler added a comment - How should we proceed with this? I would like to commit this, but I am afraid of the consequences for users without a solution in QueryParser.
          Hide
          Uwe Schindler added a comment -

          Updated patch for trunk.

          Show
          Uwe Schindler added a comment - Updated patch for trunk.
          Uwe Schindler made changes -
          Attachment LUCENE-3451.patch [ 12503449 ]
          Hide
          Yonik Seeley added a comment -

          The current handling of boolean queries with only prohibited clauses is not a bug, but working as designed, so this issue is about changing that behavior. Currently working applications will now start unexpectedly throwing exceptions... now that's trappy.

          I think we should also backport to 3.x? And advertise the breakage. If users hit this exception it means their current searches aren't working

          We shouldn't make assumptions like that - those applications may be working exactly as designed.
          The issue with stopwords is complex - there is no right way IMO (returning all documents vs no documents), and some users will continue to be surprised by the results regardless of which you do.

          Changing BQ to prohibit negative queries means that it will immediately mean that our QueryParser (and maybe other code) has a very serious bug. This issue can't be committed without addressing other parts of Lucene and Solr that can generate negative queries first.

          I'm currently leaning toward the current behavior of BQ.

          Show
          Yonik Seeley added a comment - The current handling of boolean queries with only prohibited clauses is not a bug, but working as designed, so this issue is about changing that behavior. Currently working applications will now start unexpectedly throwing exceptions... now that's trappy. I think we should also backport to 3.x? And advertise the breakage. If users hit this exception it means their current searches aren't working We shouldn't make assumptions like that - those applications may be working exactly as designed. The issue with stopwords is complex - there is no right way IMO (returning all documents vs no documents), and some users will continue to be surprised by the results regardless of which you do. Changing BQ to prohibit negative queries means that it will immediately mean that our QueryParser (and maybe other code) has a very serious bug. This issue can't be committed without addressing other parts of Lucene and Solr that can generate negative queries first. I'm currently leaning toward the current behavior of BQ.
          Yonik Seeley made changes -
          Issue Type Bug [ 1 ] Improvement [ 4 ]
          Hide
          Uwe Schindler added a comment -

          This is qhy I opened the sub-issue to change queryparser.

          At least we should make BoolenFilter in contrib/queries to behave like BQ and not assume all bits set initially when only negative clauses occur. We might only want to add a MatchAllDocumentsFilter to allow the behaviour from before.

          Show
          Uwe Schindler added a comment - This is qhy I opened the sub-issue to change queryparser. At least we should make BoolenFilter in contrib/queries to behave like BQ and not assume all bits set initially when only negative clauses occur. We might only want to add a MatchAllDocumentsFilter to allow the behaviour from before.
          Robert Muir made changes -
          Fix Version/s 4.1 [ 12321140 ]
          Fix Version/s 4.0 [ 12314025 ]
          Steve Rowe made changes -
          Fix Version/s 4.2 [ 12323899 ]
          Fix Version/s 4.1 [ 12321140 ]
          Robert Muir made changes -
          Fix Version/s 4.3 [ 12324143 ]
          Fix Version/s 4.2 [ 12323899 ]
          Uwe Schindler made changes -
          Fix Version/s 4.4 [ 12324323 ]
          Fix Version/s 4.3 [ 12324143 ]
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Steve Rowe made changes -
          Fix Version/s 5.0 [ 12321663 ]
          Fix Version/s 4.5 [ 12324742 ]
          Fix Version/s 4.4 [ 12324323 ]
          Adrien Grand made changes -
          Fix Version/s 4.6 [ 12324999 ]
          Fix Version/s 5.0 [ 12321663 ]
          Fix Version/s 4.5 [ 12324742 ]
          Simon Willnauer made changes -
          Fix Version/s 4.7 [ 12325572 ]
          Fix Version/s 4.6 [ 12324999 ]
          David Smiley made changes -
          Fix Version/s 4.8 [ 12326269 ]
          Fix Version/s 4.7 [ 12325572 ]
          Hide
          Uwe Schindler added a comment -

          Move issue to Lucene 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Lucene 4.9.
          Uwe Schindler made changes -
          Fix Version/s 4.9 [ 12326730 ]
          Fix Version/s 5.0 [ 12321663 ]
          Fix Version/s 4.8 [ 12326269 ]
          Hide
          Jack Krupansky added a comment -

          Yonik Seeley says:

          The current handling of boolean queries with only prohibited clauses is not a bug, but working as designed, so this issue is about changing that behavior. Currently working applications will now start unexpectedly throwing exceptions... now that's trappy.

          The fact that a pure negative query, actually a sub-query within parentheses in the query parser, returns zero documents has been a MAJOR problem for Solr users. I've lost count how many times it has come up on the user list and we tell users to work around the problem by manually inserting "*:*" after the left parenthesis.

          But I am interested in hearing why it is believed that it is "working as designed" and whether there are really applications that would intentionally write a list of negative clauses when the "design" is that they will simply be ignored and match no documents. If that kind of compatibility is really needed, I would say it can be accommodated with a config setting, rather than give unexpected and bad behavior for so many other people with the current behavior.

          I would prefer to see a "fix" the problem by having BQ do the right thing by implicitly starting with a MatchAllDocsQuery if only MUST_NOT clauses are present, but... if that is not possible, an exception would be much better.

          Alternatively, given the difficulty of doing almost anything with the various query parsers, the method that generates the BQ for the query parser (QueryParserBase .getBooleanQuery) should just check for pure negative clauses and then add the MADQ. If this is massively controversial, just add a config option to disable it.

          Show
          Jack Krupansky added a comment - Yonik Seeley says: The current handling of boolean queries with only prohibited clauses is not a bug, but working as designed, so this issue is about changing that behavior. Currently working applications will now start unexpectedly throwing exceptions... now that's trappy. The fact that a pure negative query, actually a sub-query within parentheses in the query parser, returns zero documents has been a MAJOR problem for Solr users. I've lost count how many times it has come up on the user list and we tell users to work around the problem by manually inserting "*:*" after the left parenthesis. But I am interested in hearing why it is believed that it is "working as designed" and whether there are really applications that would intentionally write a list of negative clauses when the "design" is that they will simply be ignored and match no documents. If that kind of compatibility is really needed, I would say it can be accommodated with a config setting, rather than give unexpected and bad behavior for so many other people with the current behavior. I would prefer to see a "fix" the problem by having BQ do the right thing by implicitly starting with a MatchAllDocsQuery if only MUST_NOT clauses are present, but... if that is not possible, an exception would be much better. Alternatively, given the difficulty of doing almost anything with the various query parsers, the method that generates the BQ for the query parser (QueryParserBase .getBooleanQuery) should just check for pure negative clauses and then add the MADQ. If this is massively controversial, just add a config option to disable it.
          Hide
          Yonik Seeley added a comment - - edited

          But I am interested in hearing why it is believed that it is "working as designed"

          "Working as designed" means just that... not that it's optimal, but that it is working the way the original author intended. FWIW, I was really only against throwing an exception. I personally think it would be fine to insert *:* for the user where appropriate.

          and whether there are really applications that would intentionally write a list of negative clauses

          Machine generated queries (including those from our own query parsers).
          For example, (a -x) reduces to (-x) if "a" is a stopword. Inserting *:* when a boolean query contains only negative clauses was vetoed in LUCENE-3460.

          Show
          Yonik Seeley added a comment - - edited But I am interested in hearing why it is believed that it is "working as designed" "Working as designed" means just that... not that it's optimal, but that it is working the way the original author intended. FWIW, I was really only against throwing an exception. I personally think it would be fine to insert *:* for the user where appropriate. and whether there are really applications that would intentionally write a list of negative clauses Machine generated queries (including those from our own query parsers). For example, (a -x) reduces to (-x) if "a" is a stopword. Inserting *:* when a boolean query contains only negative clauses was vetoed in LUCENE-3460 .
          Hide
          Jack Krupansky added a comment -

          Thanks, Yonik Seeley. Although the "(a -x)" stop word case seems to argue even more strenuously for at least an exception if ]*:* can't be inserted.

          Besides, the stop word case is better handled by the Lucid approach of keeping all stop words (if they are indexed) if the sub-query terms are all stop words as in this case. So it would be only be problematic for the case of non-indexed stop words, which is really an anti-pattern anyway these days.

          Show
          Jack Krupansky added a comment - Thanks, Yonik Seeley . Although the "(a -x)" stop word case seems to argue even more strenuously for at least an exception if ]*:* can't be inserted. Besides, the stop word case is better handled by the Lucid approach of keeping all stop words (if they are indexed) if the sub-query terms are all stop words as in this case. So it would be only be problematic for the case of non-indexed stop words, which is really an anti-pattern anyway these days.
          Hide
          Jack Krupansky added a comment -

          Yonik Seeley says:

          I personally think it would be fine to insert : for the user where appropriate.

          Ah! Since the "divorce" that gave Solr "custody" of its own copy of QueryParserBase, this change could be made there, right? I can file a Solr Jira for that (or just use one of the two open Solr issues related to pure-negative sub-queries), unless you want to do it. And then if the Solr people are happy over there, the Lucene guys can have their exception here and close this issue, and the everybody can live happily ever after, right?

          Show
          Jack Krupansky added a comment - Yonik Seeley says: I personally think it would be fine to insert : for the user where appropriate. Ah! Since the "divorce" that gave Solr "custody" of its own copy of QueryParserBase, this change could be made there, right? I can file a Solr Jira for that (or just use one of the two open Solr issues related to pure-negative sub-queries), unless you want to do it. And then if the Solr people are happy over there, the Lucene guys can have their exception here and close this issue, and the everybody can live happily ever after, right?

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development