Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
New
Description
This is just an exploratory patch ... still many nocommits, but I
think it may be promising.
I find the two booleans we pass to Weight.scorer confusing, because
they really only apply to whoever will call score(Collector) (just
IndexSearcher and BooleanScorer).
The params are pointless for the vast majority of scorers, because
very, very few query scorers really need to change how top-scoring is
done, and those scorers can only score top-level (throw throw UOE
from nextDoc/advance). It seems like these two types of scorers
should be separately typed.
Attachments
Attachments
- LUCENE-5487.patch
- 130 kB
- Michael McCandless
- LUCENE-5487.patch
- 86 kB
- Michael McCandless
- LUCENE-5487.patch
- 84 kB
- Michael McCandless
Activity
Hi Mike,
I think, TopScorer.java is missing in your patch! Otherwise looks great. You also implemented ConstantScorer's crazy wrapping for TopScorer. Now much easier to understand!
Uwe
Thanks! The javadocs of TopScorer are outdated. The firstDocId is no longer passed. I don't like this change, because in the default impl inside Weight.java, we now need the additional code to check if scorer is still on -1 or not. You added a comment about the setScorer, too. Mabye there is room for improvements, by:
- documenting API in a better way (sorry the crazyness of this score(Collector, maxDoc) stuff is horrible to understand, especially, if you want to implement it correctly!)
- possibly move some code away, so score(Collector) does not need to do the checks on the underlying sequential scorer
Otherwise, I like the 2 separate classes!
Another idea I had: we can make TopScorer an interface. So Scorers that support topScoring just need to implement the interface - code looks like before... Have not thought more about it!
I like this, lots of simplifications.
About this nocommit:
+ // nocommit add test verifying that BQ inside BQ can get BS1
+ // not BS2 like today
Iirc a BS1 inside a BS1 will work correctly because both use the same doc intervals in which the unordered docs can occur.
I do not recall anyone reporting a problem with this with nested BQ's, but I never tested it either.
And the doc interval might be increased (2048 now) to try and have even better BS1 performance.
The javadocs of TopScorer are outdated. The firstDocId is no longer passed.
Woops I'll fix.
I don't like this change, because in the default impl inside Weight.java, we now need the additional code to check if scorer is still on -1 or not.
Yeah ... not sure what to do about it. It became hard to know the firstDocID since consumers of TopScorer can't .nextDoc. Maybe there is another way ...
(sorry the crazyness of this score(Collector, maxDoc) stuff is horrible to understand, especially, if you want to implement it correctly!)
It is ... and I think we only have this version (the one that takes 2nd param, int max) for the "BS1 embeds BS1" case? Maybe we don't need to optimize this case? (Maybe a BQ embedded a BQ should rewrite itself to a single BQ, if somehow we could account for the proper coord scoring).
Another idea I had: we can make TopScorer an interface
That's a neat idea! So then e.g. IS.search (and BS1 embeds BS1 case) would check if the Scorer it pulled impls TopScorer, and if so, cast it and call .score(Collector); else, it would fallback to the default impl.
Iirc a BS1 inside a BS1 will work correctly because both use the same doc intervals in which the unordered docs can occur.
I think today if you nest two BQs, e.g. all SHOULD clauses, the outer BQ will use BS2 and the inner one will use BS1. This is because BQ's Weight.scorer passes "true" for topScorer when it pulls the sub-scorers.
But, the patch should fix this I think, so that BS1 sub-scorer can be used. So the nocommit is to make a test confirming this is really happening... (it's hairy!).
Commit 1573830 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1573830 ]
LUCENE-5487: commit current patch
I made a branch & committed the last patch: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5487
Commit 1574169 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1574169 ]
LUCENE-5487: also wrap TopScorer in AssertingWeight
Commit 1574188 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1574188 ]
LUCENE-5487: add test case verifying we can embed BS1 inside BS1
Commit 1574434 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1574434 ]
LUCENE-5487: add test case verifying a query time join inside a BQ still gets the optimized top-level scorer (it doesn't today); fixed nocommits
Commit 1575057 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1575057 ]
LUCENE-5487: add TopScorers to FilteredQuery too; fix Solr; resolve all nocommits
Rob suggested BulkScorer as a better name than TopScorer ... I like it ... I'll rename it.
Commit 1575234 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1575234 ]
LUCENE-5487: rename TopScorer -> BulkScorer
Commit 1575397 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1575397 ]
LUCENE-5487: merge trunk
Applyable patch from the branch (generated by diffSources.py), I think it's ready.
I think this (Weight.scorer) is an expert enough API that we can fix it in 4.8 as well?
I think this (Weight.scorer) is an expert enough API that we can fix it in 4.8 as well?
Definitely, I havent had a chance to review the patch but this seems good to do.
In this case no problem, but just as an FYI, if you have trunk/ and branch/, you always want to run the differ outside of both (this way the patch prefixes are the same: this one cant be applied by any patch tool).
But this is no problem for me to review, i can just switch to your branch to see the context!
- docs for Weight.scoresDocsOutOfOrder() should refer to bulkScorer() instead of scorer()
- a TODO should be added for BooleanWeight.scoresDocsOutOfOrder: its "out of sync" in the sense that it only checks for any required clause, but e.g. if someone has minNrShouldMatch > 1, it will lie and say its out of order, but then go and do in-order scoring (but with a slower collector). This is not new and not caused by your patch... and really it would be ideal if somehow the logic was in one place rather than duplicated.
- should FakeScorer really not take the real Weight anymore? I don't know how useful it is, but its wierd that its null, since the Collector actually sees this thing via setScorer: if its not going to be supported then i think it should override getWeight to explicitly throw UOE?
- what is the purpose of a LeapFrogBulkScorer? It seems to just use two in-order scorers, i dont understand its purpose. Is this supposed to be a code specialization? If so, how much faster is it than just... not doing that and using an in-order scorer in this case? (it seems maybe the latter way is actually faster, as you get the correct collector, same issue as BooleanWeight with minNrShouldMatch as mentioned above)
Thanks Rob!
In this case no problem, but just as an FYI, if you have trunk/ and branch/, you always want to run the differ outside of both (this way the patch prefixes are the same: this one cant be applied by any patch tool).
I actually did that at first but for some reason I thought the resulting patch file was wrong! Next time I'll do it like that.
docs for Weight.scoresDocsOutOfOrder() should refer to bulkScorer() instead of scorer()
Oh yeah, I'll fix.
a TODO should be added for BooleanWeight.scoresDocsOutOfOrder
I fixed this and added a simple test, on the branch.
should FakeScorer really not take the real Weight anymore? I don't know how useful it is, but its wierd that its null, since the Collector actually sees this thing via setScorer: if its not going to be supported then i think it should override getWeight to explicitly throw UOE?
It seems weird returning a real Weight when everything else is fake, but I guess we can just leave it as it was (in BooleanQuery)? All the other FakeScorers seem to do the null Weight thing, but I agree if we do that we should just override getWeight to throw UOE.
what is the purpose of a LeapFrogBulkScorer? It seems to just use two in-order scorers, i dont understand its purpose. Is this supposed to be a code specialization?
It was there before, but I think it's just code specialization ... I'll just nuke it and let Weight.bulkScorer do the default impl.
Commit 1576066 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1576066 ]
LUCENE-5487: add feedback from Rob
Commit 1576096 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1576096 ]
LUCENE-5487: throw OUE from FakeScorer.getWeight
Commit 1576273 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1576273 ]
LUCENE-5487: consolidate the FakeScorers within one package
Commit 1576274 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1576274 ]
LUCENE-5487: consolidate the FakeScorers within one package
Commit 1576473 from mikemccand in branch 'dev/branches/lucene5487'
[ https://svn.apache.org/r1576473 ]
LUCENE-5487: merge trunk
OK I think this is ready ... I'll reintegrate the branch, add CHANGES and commit! Thanks for the review Rob & Uwe!
Commit 1576487 from mikemccand in branch 'dev/trunk'
[ https://svn.apache.org/r1576487 ]
LUCENE-5487: separate Weight.bulkScorer and Weight.scorer
Commit 1576492 from mikemccand in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1576492 ]
LUCENE-5487: separate Weight.bulkScorer and Weight.scorer
Commit 1585967 from mikemccand@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1585967 ]
LUCENE-5554: break out separate Weight.scoreRange/scoreAll so hotspot is relatively happy again after LUCENE-5487
Commit 1585968 from mikemccand@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1585968 ]
LUCENE-5554: break out separate Weight.scoreRange/scoreAll so hotspot is relatively happy again after LUCENE-5487
mikemccand could save me from starring at "broke out separate Weight.scoreRange/scoreAll methods" for a few mins, if it was clued by a comment mentions #hotspot
Commit 1589416 from mikemccand in branch 'dev/trunk'
[ https://svn.apache.org/r1589416 ]
LUCENE-5487: add comment explaining hotspot voodoo
Michael McCandless could save me from starring at "broke out separate Weight.scoreRange/scoreAll methods" for a few mins, if it was clued by a comment mentions #hotspot
I committed a fix ...
Commit 1589422 from mikemccand in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1589422 ]
LUCENE-5487: add comment explaining hotspot voodoo
Patch. I made a new class called TopScorer (we can pick a better
name...), separated out a Weight.topScorer, with the obvious default
impl, and then removed the two boolean params from Weight.scorer.
Tests pass.