Issue Details (XML | Word | Printable)

Key: LUCENE-1187
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Michael Busch
Reporter: Paul Elschot
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Things to be done now that Filter is independent from BitSet

Created: 23/Feb/08 10:33 AM   Updated: 03/Jun/08 07:08 PM
Return to search
Component/s: contrib/*, Search
Affects Version/s: None
Fix Version/s: 2.4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works BooleanFilter20080325.patch 2008-03-25 07:48 PM Paul Elschot 17 kB
Text File Licensed for inclusion in ASF works ChainedFilterAndCachingFilterTest.patch 2008-03-13 10:54 PM Mark Miller 2 kB
Text File Licensed for inclusion in ASF works Contrib20080325.patch 2008-03-25 10:31 PM Paul Elschot 26 kB
Text File Licensed for inclusion in ASF works Contrib20080326.patch 2008-03-26 06:23 PM Paul Elschot 27 kB
Text File Licensed for inclusion in ASF works Contrib20080427.patch 2008-04-27 07:19 PM Paul Elschot 28 kB
Text File Licensed for inclusion in ASF works javadocsZero2Match.patch 2008-02-26 09:30 PM Paul Elschot 2 kB
Text File Licensed for inclusion in ASF works lucene-1187.patch 2008-05-22 10:57 PM Michael Busch 48 kB
Text File Licensed for inclusion in ASF works lucene-1187.patch 2008-05-22 08:15 AM Michael Busch 32 kB
Text File Licensed for inclusion in ASF works OpenBitSetDISI-20080322.patch 2008-03-22 09:44 PM Paul Elschot 2 kB

Lucene Fields: Patch Available, New
Resolution Date: 23/May/08 07:27 PM


 Description  « Hide
(Aside: where is the documentation on how to mark up text in jira comments?)

The following things are left over after LUCENE-584 :

For Lucene 3.0 Filter.bits() will have to be removed.

There is a CHECKME in IndexSearcher about using ConjunctionScorer to have the boolean behaviour of a Filter.

I have not looked into Filter caching yet, but I suppose there will be some room for improvement there.
Iirc the current core has moved to use OpenBitSetFilter and that is probably what is being cached.
In some cases it might be better to cache a SortedVIntList instead.

Boolean logic on DocIdSetIterator is already available for Scorers (that inherit from DocIdSetIterator) in the search package. This is currently implemented by ConjunctionScorer, DisjunctionSumScorer,
ReqOptSumScorer and ReqExclScorer.
Boolean logic on BitSets is available in contrib/misc and contrib/queries

DisjunctionSumScorer calls score() on its subscorers before the score value actually needed.
This could be a reason to introduce a DisjunctionDocIdSetIterator, perhaps as a superclass of DisjunctionSumScorer.

To fully implement non scoring queries a TermDocIdSetIterator will be needed, perhaps as a superclass of TermScorer.

The javadocs in org.apache.lucene.search using matching vs non-zero score:
I'll investigate this soon, and provide a patch when necessary.

An early version of the patches of LUCENE-584 contained a class Matcher,
that differs from the current DocIdSet in that Matcher has an explain() method.
It remains to be seen whether such a Matcher could be useful between
DocIdSet and Scorer.

The semantics of scorer.skipTo(scorer.doc()) was discussed briefly.
This was also discussed at another issue recently, so perhaps it is wortwhile to open a separate issue for this.

Skipping on a SortedVIntList is done using linear search, this could be improved by adding multilevel skiplist info much like in the Lucene index for documents containing a term.

One comment by me of 3 Dec 2008:

A few complete (test) classes are deprecated, it might be good to add the target release for removal there.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.