|
Abdul Chaudhry made changes - 02/Oct/05 02:32 PM
I like performance, so I'm all for this.
One thing about the code: it has a few probably superfluous casts to (Scorer) from Scorer array elements. Regards, Simplified version, removed most casts, inlined first() and last().
Passes all tests here. Regards,
Paul Elschot made changes - 03/Oct/05 04:22 AM
Thanks Paul, re-ran load tests here, everything looks good and it looks easier to read as well.
I also checked Arrays.sort, it's a merge sort - guaranteed nlogn. This all looks squeaky clean. Cheers, – Abdul The score() method can be simplified to use only the 'i' index. There is no need
to offset the index into the scorers with the 'pos' variable, because all scorers match. Regards, ok, this makes sense as the scoring engine runs something like this
while (scorer.next()) { That is, next() will have ordered everything, so that by the time we call the scorer.score() method , everything should be in-order. Thanks, ill give that a go. The impression I have with lucene, and correct me if Im wrong, is that complex queries with many terms and clauses have their bottleneck in terms of performance in the ordering phase, that is scorer.next() requires everything to be in-document order and all the scorer sub-engines must comply. Collection is a moot point as you probably have small numbers of hits. However, on the other end of the scale, for queries with one or two terms that have a very high frequency the bottleneck is really in collection, that is the priority queue in collector.collect(), Once the ConjunctionScorer matches the order of the subqueries/subscorers does not matter
because they all match and a sum score needs to be formed. Scorer.next() can only tolerate documents not to be in order at top level, where hit collection is done. Regards, It seems like more cruft could be removed if support for incremental adding of subscorers was removed.
Is there a reason we can't simplify things and just pass Collection<Scorer> in the constructor? Yonik, Paul, do either of you know the status on this one? From the looks of it, it hasn't been implemented. It also has the highest number of votes in JIRA, so I thought I would take a look at it. One downside is it is not in patch form, but it also doesn't look to hard to extract the changes, either.
One issue I have with these performance issues is that we don't have a reliable benchmarking suite. I am not a lawyer, but might we be able to use something like http://trec.nist.gov/data/reuters/reuters.html Iirc the orginal performance problem was caused by creation of objects in the tight loop
doing skipTo() on al the scorers. This patch is against current trunk and based on the earlier posted versions of ConjunctionScorer.
Paul Elschot made changes - 21/Sep/06 07:17 AM
I just overlooked the grant by Abdul to the ASF.
Thanks! I just committed this.
Yonik Seeley made changes - 17/Oct/06 08:51 PM
Closing all issues that were resolved for 2.1.
Michael McCandless made changes - 27/Feb/07 06:10 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The first(), last() and doc() method calls should probably be in-lined - maybe.