Lucene - Core
  1. Lucene - Core
  2. LUCENE-72

[PATCH] Query parser inconsistency when using terms to exclude.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 1.2
    • Fix Version/s: None
    • Component/s: core/queryparser
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: PC

      Description

      Hi.
      The problem I am having occurs when using queryparser and also when building the
      query using the API.
      Assume that we want to look for documents about fruits or vegetables but
      excluding tomatoes and bananas. I suppose the right query sould be:

      +(fruits vegetables) AND (-tomatoes -bananas)

      wich I think is equivalent to (if tou parse it and then print the query.toString
      ("") result that is what you get)

      +(fruits vegetables) +(-tomatoes -bananas)

      but the query doesn't work as expected, in fact the query that works is

      +(fruits vegetables) -(-tomatoes -bananas)

      which doesn´t really make much sense, because the second part seems to say:
      All documents where the condition "tomatoes is not present and bananas is not
      present " is false, which means the opposite.

      In fact, second query works as (even if they look quite opposite):
      +(fruits vegetables) -tomatoes -bananas

      Hope someone could help, thanks

      1. ASF.LICENSE.NOT.GRANTED--patch6.txt
        0.8 kB
        Jean-François Halleux
      2. ASF.LICENSE.NOT.GRANTED--patch7.txt
        0.7 kB
        Jean-François Halleux
      3. TestRegressionLucene72.java
        7 kB
        Dejan Nenov
      4. TestRegressionLucene72.java
        8 kB
        Dejan Nenov

        Issue Links

          Activity

          Hide
          Shai Erera added a comment -

          As per the discussion, this should have been closed long time ago.

          Show
          Shai Erera added a comment - As per the discussion, this should have been closed long time ago.
          Hide
          Dejan Nenov added a comment -

          This makes sense. Iconcur that docs can be better - maybe we shoud lopen a separate JIRA issue on that?

          I propose this to be closed as "will not fix".

          Show
          Dejan Nenov added a comment - This makes sense. Iconcur that docs can be better - maybe we shoud lopen a separate JIRA issue on that? I propose this to be closed as "will not fix".
          Hide
          Hoss Man added a comment -

          I think the general issue here is that mixing syntax (ie: using AND, OR or NOT along with "+" and "-") is not something that works very well in the QueryParser.

          At the lowest level the "+" and "-" syntax most closely models the way Lucene BooleanQueries work – most specificly, they are not truely BooleanQueries – they are agregation queries, in which each sub query can be required, optional or prohibited – but at least one most always "match" and positively select some documents. (it is invalide to have a BooleanQuery containing all "prohibited" clauses)

          Setting the default operator onthe QueryParser to be "OR" or "AND" really just tells the QueryParser whether you want the default property of the sub-queries to be "optional" or "required" in the absense of other information.

          when specifying a query like: +(fruits vegetables) AND (-tomatoes -bananas)
          ...this is really just a varient expression of: +(fruits vegetables) +(-tomatoes -bananas)
          ...which is not a valid query becuse the second clause doesn't match anything

          when specifing a query like: fruits OR -tomatoes
          ...this is really just a varient expression of: fruits -tomatoes
          ...which (since there is only one "optional" clause and no "required" clauses) will only match documents containing the word "fruits" as long as they do not match the word "tomatoes"

          in short ... things are behaving as expected. The only question is wether documentation might be improved to make the behavior more clear to people.

          as for the (now very old) patches to this bug ... they don't acctually seem to be related at all as far as i can tell.

          Show
          Hoss Man added a comment - I think the general issue here is that mixing syntax (ie: using AND, OR or NOT along with "+" and "-") is not something that works very well in the QueryParser. At the lowest level the "+" and "-" syntax most closely models the way Lucene BooleanQueries work – most specificly, they are not truely BooleanQueries – they are agregation queries, in which each sub query can be required, optional or prohibited – but at least one most always "match" and positively select some documents. (it is invalide to have a BooleanQuery containing all "prohibited" clauses) Setting the default operator onthe QueryParser to be "OR" or "AND" really just tells the QueryParser whether you want the default property of the sub-queries to be "optional" or "required" in the absense of other information. when specifying a query like: +(fruits vegetables) AND (-tomatoes -bananas) ...this is really just a varient expression of: +(fruits vegetables) +(-tomatoes -bananas) ...which is not a valid query becuse the second clause doesn't match anything when specifing a query like: fruits OR -tomatoes ...this is really just a varient expression of: fruits -tomatoes ...which (since there is only one "optional" clause and no "required" clauses) will only match documents containing the word "fruits" as long as they do not match the word "tomatoes" in short ... things are behaving as expected. The only question is wether documentation might be improved to make the behavior more clear to people. as for the (now very old) patches to this bug ... they don't acctually seem to be related at all as far as i can tell.
          Hide
          Dejan Nenov added a comment -

          Please ignore the previos version - it was very sloppy.
          I added one more test, which yields a strange (for me) result:

          fruits OR -tomatoes

          returns only:

          fruits vegetables peppers kiwis
          fruits vegetables peppers bananas

          but does not return:

          fruits vegetables tomatoes bananas
          fruits vegetables tomatoes kiwis

          I would expected all four docs to match?

          Show
          Dejan Nenov added a comment - Please ignore the previos version - it was very sloppy. I added one more test, which yields a strange (for me) result: fruits OR -tomatoes returns only: fruits vegetables peppers kiwis fruits vegetables peppers bananas but does not return: fruits vegetables tomatoes bananas fruits vegetables tomatoes kiwis I would expected all four docs to match?
          Hide
          Dejan Nenov added a comment -

          This issue was so old that I wanted to verify thatit still exists.
          The attached test is specific to the issue and indeed shows that

          +(fruits vegetables) AND (-tomatoes -bananas)

          does not perform as expected.

          I use "QueryParser.setDefaultOperator(QueryParser.OR_OPERATOR)"
          and I setup 4 documents:

          Doc1 = fruits vegetables tomatoes bananas
          Doc2 = fruits vegetables tomatoes kiwis
          Doc3 = fruits vegetables peppers kiwis
          Doc4 = fruits vegetables peppers bananas

          My expectations is to get docs 2,3,4 - instead the query returns no hits.

          Somebody please check that this makes sense.

          I have not run this test with the attached patches applied, however - I decided to not spend the time applying 2 year old patches to the current release

          Show
          Dejan Nenov added a comment - This issue was so old that I wanted to verify thatit still exists. The attached test is specific to the issue and indeed shows that +(fruits vegetables) AND (-tomatoes -bananas) does not perform as expected. I use "QueryParser.setDefaultOperator(QueryParser.OR_OPERATOR)" and I setup 4 documents: Doc1 = fruits vegetables tomatoes bananas Doc2 = fruits vegetables tomatoes kiwis Doc3 = fruits vegetables peppers kiwis Doc4 = fruits vegetables peppers bananas My expectations is to get docs 2,3,4 - instead the query returns no hits. Somebody please check that this makes sense. I have not run this test with the attached patches applied, however - I decided to not spend the time applying 2 year old patches to the current release
          Hide
          Jean-François Halleux added a comment -

          Created an attachment (id=10276)
          Some more unit tests in the "escaped" department

          Show
          Jean-François Halleux added a comment - Created an attachment (id=10276) Some more unit tests in the "escaped" department
          Hide
          Jean-François Halleux added a comment -

          Created an attachment (id=10275)
          A patch to queryparser to properly handle escaping char in field

          Show
          Jean-François Halleux added a comment - Created an attachment (id=10275) A patch to queryparser to properly handle escaping char in field

            People

            • Assignee:
              Unassigned
              Reporter:
              Carlos
            • Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development