Issue Details (XML | Word | Printable)

Key: LUCENE-1300
Type: Bug Bug
Status: Resolved Resolved
Resolution: Duplicate
Priority: Minor Minor
Assignee: Unassigned
Reporter: steve halsey
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Negative wildcard searches on MultiSearcher not eliminating correctly.

Created: 06/Jun/08 05:44 PM   Updated: 11/Sep/08 02:27 PM
Return to search
Component/s: Search
Affects Version/s: 2.1, 2.3, 2.3.1
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File Licensed for inclusion in ASF works TestMultiSearcherNegativeWildcardQueryExpansion.java 2008-06-06 05:47 PM steve halsey 7 kB
Java Source File Licensed for inclusion in ASF works TestMultiSearcherNegativeWildcardQueryExpansionWorksWith151.java 2008-06-06 06:10 PM steve halsey 8 kB
Environment: Windows XP, cygwin.
Issue Links:
Duplicate
 

Lucene Fields: New
Resolution Date: 11/Sep/08 02:27 PM


 Description  « Hide
If you do a search for a negative wildcard query on a MultiSearcher where one of the searchers is empty e.g. "lucene -bug*" the hits returned incorrectly include articles with words that should be eliminated, e.g. "bug" and "bugs". This is because the query expansion is done on the index with docs in and the empty ndex separately and then combined as an OR to be run on the MultiSearcher. This incorrectly lets in docs that have the excluded wildcard terms, e.g. "bug" nd bugs". This bug would also show up with two indexes full of docs, and I can send a test to show that if required, but I think this test demonstrates the bug in the implest way.

The attached class TestMultiSearcherNegativeWildcardQueryExpansion.java can be put in with other tests in org.apache.lucene.search and run and will fail, showing the bug exists.

I have tested this bug with the currently unreleased 2.3.2 and the released 2.1 and 2.3.1 and it fails on all.

With lucene-1.5-rc1 it passes (with mods to make it work with old API) see TestMultiSearcherNegativeWildcardQueryExpansionWorksWith151.java attachment.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
steve halsey added a comment - 06/Jun/08 05:47 PM
This test fails, demonstrating the existence of the negative wildcard query on MultiSearchers bug.

steve halsey added a comment - 06/Jun/08 06:10 PM
This test class shows the test working with the old 1.5.1 version of lucene, modified to make it work with that old API.

Mark Miller added a comment - 06/Jun/08 07:49 PM
Great catch Steve. The combine method in Query appears to be very flawed when it comes to MUST_NOT occurrences and truncation queries. Nasty little bug that does indeed appear to go back to 05. Thanks for all of the detailed info. I am sure someone will be right on top of this.

Mark Miller added a comment - 07/Jun/08 02:43 AM
Looks like the test worked before because things were even (it would appear) worse - the bug was that the multi term query was only expanded on the first index and then the resulting query was used on all of the indexes. The issue introducing the bug you have found was an attempt to fix this by expanding on each Reader and then attempting to make a single query that works across each Reader. The strategy seems to work in non MUST_NOT cases, but the query generated can just be wrong with a MUST_NOT occurrence. As you point out, the second index doesn't even have to be empty, and the second -() clause does not even have to be empty either - the generated query can still be wrong.

I don't see the obvious fix - somehow we need a query that expands against all of the subreaders as if one reader, or the combine method has to figure this out...neither seem easy to me...


Mark Miller added a comment - 09/Jun/08 01:54 AM
One option may be to do this:

Figure out how we can change Searchable (deprecation, whatever) and add a getIndexReader method. Make the getIndexReader method on MultiSearcher return a MultiReader with the underlying searchable Readers. Now on MultiSearcher rewrite, the query can be rewritten on a temp IndexSearcher that uses the MultiReader.

Ive tried it quickly, but I havn't thought out all the ramifications. I wouldnt be surprised if there was some biggies (being able to get an IndexReader off a Searchable would be quite the change)...and obviously changing Searchable pretty much sucks. Any other ideas though? I can't think of a way to make the combine method work right otherwise without recursing down the query and doing some really nasty bookkeeping stuff.


Mark Miller added a comment - 11/Jun/08 08:56 PM
Okay, I clearly underestimated the difficulties of this due to RemoteSearchable. I don't see how it can be done in any efficient manner when you have to work off a Searchable, and obviously you can't do anything with multiple Readers using the MultiSearcher on the client side, so it would seem making the Query.combine method work is the only option...except that doing such a thing would be really nasty I think.

Mark Miller added a comment - 21/Aug/08 12:07 PM
This is actually a dupe of an older issue.

There is no clean way to fix it with the current Searchable API. Avoid MultiSearcher if you can <g>


steve halsey added a comment - 21/Aug/08 02:14 PM
Hi Mark,

OK. Thanks for that. It is a rarely seen problem, because most times
when people want to eliminate a word e.g. lucen* then all of the terms
will be in both halves of the index and so the query expansion will be
acurate and the query will work.

Cheers

steve