Issue Details (XML | Word | Printable)

Key: LUCENE-72
Type: Bug Bug
Status: Open Open
Priority: Minor Minor
Assignee: Lucene Developers
Reporter: Carlos
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

[PATCH] Query parser inconsistency when using terms to exclude.

Created: 31/Dec/02 10:48 PM   Updated: 31/Dec/07 01:52 PM
Return to search
Component/s: QueryParser
Affects Version/s: 1.2
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File patch6.txt 2004-02-09 08:10 PM Jean-François Halleux 0.8 kB
Text File patch7.txt 2004-02-09 08:14 PM Jean-François Halleux 0.7 kB
Java Source File Licensed for inclusion in ASF works TestRegressionLucene72.java 2006-08-29 05:08 PM Dejan Nenov 8 kB
Java Source File Licensed for inclusion in ASF works TestRegressionLucene72.java 2006-08-29 05:58 AM Dejan Nenov 7 kB
Environment:
Operating System: All
Platform: PC
Issue Links:
Reference
 

Bugzilla Id: 15739


 Description  « Hide
Hi.
The problem I am having occurs when using queryparser and also when building the
query using the API.
Assume that we want to look for documents about fruits or vegetables but
excluding tomatoes and bananas. I suppose the right query sould be:

+(fruits vegetables) AND (-tomatoes -bananas)

wich I think is equivalent to (if tou parse it and then print the query.toString
("") result that is what you get)

+(fruits vegetables) +(-tomatoes -bananas)

but the query doesn't work as expected, in fact the query that works is

+(fruits vegetables) -(-tomatoes -bananas)

which doesn´t really make much sense, because the second part seems to say:
All documents where the condition "tomatoes is not present and bananas is not
present " is false, which means the opposite.

In fact, second query works as (even if they look quite opposite):
+(fruits vegetables) -tomatoes -bananas

Hope someone could help, thanks



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jean-François Halleux added a comment - 09/Feb/04 08:10 PM
Created an attachment (id=10275)
A patch to queryparser to properly handle escaping char in field

Jean-François Halleux added a comment - 09/Feb/04 08:14 PM
Created an attachment (id=10276)
Some more unit tests in the "escaped" department

Dejan Nenov added a comment - 29/Aug/06 05:58 AM
This issue was so old that I wanted to verify thatit still exists.
The attached test is specific to the issue and indeed shows that

+(fruits vegetables) AND (-tomatoes -bananas)

does not perform as expected.

I use "QueryParser.setDefaultOperator(QueryParser.OR_OPERATOR)"
and I setup 4 documents:

Doc1 = fruits vegetables tomatoes bananas
Doc2 = fruits vegetables tomatoes kiwis
Doc3 = fruits vegetables peppers kiwis
Doc4 = fruits vegetables peppers bananas

My expectations is to get docs 2,3,4 - instead the query returns no hits.

Somebody please check that this makes sense.

I have not run this test with the attached patches applied, however - I decided to not spend the time applying 2 year old patches to the current release


Dejan Nenov added a comment - 29/Aug/06 05:08 PM
Please ignore the previos version - it was very sloppy.
I added one more test, which yields a strange (for me) result:

fruits OR -tomatoes

returns only:

fruits vegetables peppers kiwis
fruits vegetables peppers bananas

but does not return:

fruits vegetables tomatoes bananas
fruits vegetables tomatoes kiwis

I would expected all four docs to match?


Hoss Man added a comment - 30/Aug/06 05:53 PM
I think the general issue here is that mixing syntax (ie: using AND, OR or NOT along with "+" and "-") is not something that works very well in the QueryParser.

At the lowest level the "+" and "-" syntax most closely models the way Lucene BooleanQueries work – most specificly, they are not truely BooleanQueries – they are agregation queries, in which each sub query can be required, optional or prohibited – but at least one most always "match" and positively select some documents. (it is invalide to have a BooleanQuery containing all "prohibited" clauses)

Setting the default operator onthe QueryParser to be "OR" or "AND" really just tells the QueryParser whether you want the default property of the sub-queries to be "optional" or "required" in the absense of other information.

when specifying a query like: +(fruits vegetables) AND (-tomatoes -bananas)
...this is really just a varient expression of: +(fruits vegetables) +(-tomatoes -bananas)
...which is not a valid query becuse the second clause doesn't match anything

when specifing a query like: fruits OR -tomatoes
...this is really just a varient expression of: fruits -tomatoes
...which (since there is only one "optional" clause and no "required" clauses) will only match documents containing the word "fruits" as long as they do not match the word "tomatoes"

in short ... things are behaving as expected. The only question is wether documentation might be improved to make the behavior more clear to people.

as for the (now very old) patches to this bug ... they don't acctually seem to be related at all as far as i can tell.


Dejan Nenov added a comment - 31/Aug/06 02:07 AM
This makes sense. Iconcur that docs can be better - maybe we shoud lopen a separate JIRA issue on that?

I propose this to be closed as "will not fix".