[LUCENE-295] [PATCH] MultiSearcher problems with Similarity.docFreq() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4
Fix Version/s: None
Component/s: core/search
Labels:
None
Environment:

Operating System: other
Platform: All

Bugzilla Id:
31841

Description

When MultiSearcher invokes its subsearchers, it is the subsearchers' docFreq()
that is accessed by Similarity.docFreq(). This causes idf's to be computed
local to each index rather than globally, which causes ranking across multiple
indices to not be equivalent to ranking across the entire global collection.

The attached files (if I can figure out how to attach them) provide a potential
partial solution for this. They properly fix a simple test case, RankingTest,
that was provided by Daniel Naber.

The changes are:
1. Searcher: Add topmostSearcher() field with getter and setter to record
the outermost Searcher. Default to this.
2. MultiSearcher: Pass down the topmostSearcher when creating the subsearchers.
3. IndexSearcher: Call Query.weight() everywhere with the topmostSearcher
instead of this.
4. Query: Provide a default implementation of Query.combine() so that
MultiSearcher works with all queries.

Problems or possible problems I see:
1. This does not address the same issue with RemoteSearchable.
RemoteSearchable is not a Searcher, nor can it be due to lack of multiple
inheritance in Java, but Query.weight() requires a Searcher. Perhaps
Query.weight() should be changed to take a Searchable, but this requires
changing many places and I suspect would break apps.
2. There may be other places that topmostSearcher should be used instead of this.
3. The default implementation for Query.combine() is a guess on my part - it
works for TermQuery. It's fragile in that the default implementation will hide
bugs caused by queries that inadvertently omit a more precise Query.combine()
method.
4. The prior comment on Query.combine() indicates that whoever wrote it was
fully aware of this problem and so probably had another usage in mind, so the
whole issue may just be Daniel's usage in the test case. It's not apparent to
me, so I probably don't understand something.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--combine-fix.patch
27/Apr/05 05:51
5 kB
Chuck Williams
ASF.LICENSE.NOT.GRANTED--combine-fix2.diff
27/Apr/05 19:03
13 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--multisearcher.diff
21/Jun/05 16:44
14 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--multisearcher.diff
10/Feb/05 19:33
20 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--multisearcher.diff
12/Nov/04 17:39
2 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--multisearcher2.diff
15/Nov/04 21:24
13 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--multisearcher-2005-02-18b.diff
18/Feb/05 20:33
47 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--multisearcher-2005-02-22c.diff
23/Feb/05 00:33
48 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--multisearcher-2005-04-19.diff
20/Apr/05 12:13
43 kB
Otis Gospodnetic
ASF.LICENSE.NOT.GRANTED--multisearcher-deprecation.diff
26/Apr/05 16:34
19 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--multisearcher-deprecation.diff
22/Apr/05 17:08
20 kB
Wolf Siberski
ASF.LICENSE.NOT.GRANTED--MultiSearcherPatch.zip
22/Oct/04 08:15
7 kB
Chuck Williams
ASF.LICENSE.NOT.GRANTED--patch.diff
23/Oct/04 01:58
4 kB
Daniel Naber

Activity

People

Assignee:: Lucene Developers

Reporter:: Chuck Williams

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 22/Oct/04 08:11

Updated:: 5 days ago 16:16

Resolved:: 27/May/06 01:38