Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think we would benefit from having an AssertingScorer that would assert that scorers are advanced correctly, return valid scores (eg. not NaN), ...

      1. LUCENE-4903.patch
        81 kB
        Adrien Grand
      2. LUCENE-4903.patch
        88 kB
        Adrien Grand

        Activity

        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] jpountz
        http://svn.apache.org/viewvc?view=revision&revision=1466719

        LUCENE-4903: Add AssertingScorer.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1466719 LUCENE-4903 : Add AssertingScorer.
        Adrien Grand made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Adrien Grand added a comment -

        I just committed. Hopefully this will find bugs in Scorers!

        Show
        Adrien Grand added a comment - I just committed. Hopefully this will find bugs in Scorers!
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] jpountz
        http://svn.apache.org/viewvc?view=revision&revision=1466709

        LUCENE-4903: Add AssertingScorer.

        Show
        Commit Tag Bot added a comment - [trunk commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1466709 LUCENE-4903 : Add AssertingScorer.
        Hide
        Robert Muir added a comment -

        I'd just commit the thing

        I'm happy you are doing this work and happy to try to look at patches... but this would really improve our tests

        Show
        Robert Muir added a comment - I'd just commit the thing I'm happy you are doing this work and happy to try to look at patches... but this would really improve our tests
        Hide
        Adrien Grand added a comment -

        So we don't need the weak map anymore right?

        It could still be useful to Scorers that override score(Collector collector) and call collector.setScorer(this) in the body of this method I think.

        maybe AssertingWeight's scorer() method should create a new Random(random.nextLong()) to pass to the AssertingScorer when it creates it?

        Good point. I'll update the patch.

        Show
        Adrien Grand added a comment - So we don't need the weak map anymore right? It could still be useful to Scorers that override score(Collector collector) and call collector.setScorer(this) in the body of this method I think. maybe AssertingWeight's scorer() method should create a new Random(random.nextLong()) to pass to the AssertingScorer when it creates it? Good point. I'll update the patch.
        Hide
        Robert Muir added a comment -

        maybe AssertingWeight's scorer() method should create a new Random(random.nextLong()) to pass to the AssertingScorer when it creates it? This way when AssertingIndexSearcher wraps an IndexSearcher with an executorService configured, things are more likely to reproduce.

        Show
        Robert Muir added a comment - maybe AssertingWeight's scorer() method should create a new Random(random.nextLong()) to pass to the AssertingScorer when it creates it? This way when AssertingIndexSearcher wraps an IndexSearcher with an executorService configured, things are more likely to reproduce.
        Hide
        Robert Muir added a comment -

        So we don't need the weak map anymore right?

        Show
        Robert Muir added a comment - So we don't need the weak map anymore right?
        Adrien Grand made changes -
        Attachment LUCENE-4903.patch [ 12577929 ]
        Hide
        Adrien Grand added a comment - - edited

        New patch:

        • borrows Robert's idea to not delegate if the method has not been overridden,
        • AssertingScorer.score(Collector) either calls score(Collector) or score(Collector, NO_MORE_DOCS, nextDoc()) depending on random().nextBoolean()
        • modifies some join scorers so that nextDoc throws UOE instead of iterating out of order
        • adds an assertion to Scorer.score(Collector) to make sure that iteration has not started before this method is called
        • adds an assertion to Scorer.score(Collector, int, int) to make sure that docID() == firstDocID
        Show
        Adrien Grand added a comment - - edited New patch: borrows Robert's idea to not delegate if the method has not been overridden, AssertingScorer.score(Collector) either calls score(Collector) or score(Collector, NO_MORE_DOCS, nextDoc()) depending on random().nextBoolean() modifies some join scorers so that nextDoc throws UOE instead of iterating out of order adds an assertion to Scorer.score(Collector) to make sure that iteration has not started before this method is called adds an assertion to Scorer.score(Collector, int, int) to make sure that docID() == firstDocID
        Hide
        Adrien Grand added a comment -

        This is a good idea, I didn't know of this class. I'll update the patch!

        Show
        Adrien Grand added a comment - This is a good idea, I didn't know of this class. I'll update the patch!
        Hide
        Robert Muir added a comment -

        The problem is that scorers are hard to track: scoring usually happens by calling Scorer.score(Collector), which itself calls Collector.setScorer(Scorer). Since the asserting scorer delegates to the wrapped one, the asserting scorer gets lost, this is why Collector.setScorer tries to get it back by using a weak hash map.

        I'm not totally happy with it either and would really like to make Scorer.score(Collector) use methods from the asserting scorer directly. We can't rely on Scorer.score(Collector)'s default implementation since it relies on Scorer.nextDoc and some scorers such as BooleanScorer don't implement this method.

        Could we alternatively use VirtualMethod to detect if score(Collector)/score(Collector,int,int) are overridden in the underlying scorer? If they aren't, then its safe for AssertingScorer to use its own implementation (possibly with more checks).

        Show
        Robert Muir added a comment - The problem is that scorers are hard to track: scoring usually happens by calling Scorer.score(Collector), which itself calls Collector.setScorer(Scorer). Since the asserting scorer delegates to the wrapped one, the asserting scorer gets lost, this is why Collector.setScorer tries to get it back by using a weak hash map. I'm not totally happy with it either and would really like to make Scorer.score(Collector) use methods from the asserting scorer directly. We can't rely on Scorer.score(Collector)'s default implementation since it relies on Scorer.nextDoc and some scorers such as BooleanScorer don't implement this method. Could we alternatively use VirtualMethod to detect if score(Collector)/score(Collector,int,int) are overridden in the underlying scorer? If they aren't, then its safe for AssertingScorer to use its own implementation (possibly with more checks).
        Hide
        Michael McCandless added a comment -

        Since the asserting scorer delegates to the wrapped one, the asserting scorer gets lost, this is why Collector.setScorer tries to get it back by using a weak hash map.

        OK I see.

        Well I would just commit what you have now We can improve the WeakHashMap solution if necessary later ... this is an awesome step forward. Progress not perfection ...

        Show
        Michael McCandless added a comment - Since the asserting scorer delegates to the wrapped one, the asserting scorer gets lost, this is why Collector.setScorer tries to get it back by using a weak hash map. OK I see. Well I would just commit what you have now We can improve the WeakHashMap solution if necessary later ... this is an awesome step forward. Progress not perfection ...
        Hide
        Adrien Grand added a comment -

        The problem is that scorers are hard to track: scoring usually happens by calling Scorer.score(Collector), which itself calls Collector.setScorer(Scorer). Since the asserting scorer delegates to the wrapped one, the asserting scorer gets lost, this is why Collector.setScorer tries to get it back by using a weak hash map.

        I'm not totally happy with it either and would really like to make Scorer.score(Collector) use methods from the asserting scorer directly. We can't rely on Scorer.score(Collector)'s default implementation since it relies on Scorer.nextDoc and some scorers such as BooleanScorer don't implement this method.

        Show
        Adrien Grand added a comment - The problem is that scorers are hard to track: scoring usually happens by calling Scorer.score(Collector), which itself calls Collector.setScorer(Scorer). Since the asserting scorer delegates to the wrapped one, the asserting scorer gets lost, this is why Collector.setScorer tries to get it back by using a weak hash map. I'm not totally happy with it either and would really like to make Scorer.score(Collector) use methods from the asserting scorer directly. We can't rely on Scorer.score(Collector)'s default implementation since it relies on Scorer.nextDoc and some scorers such as BooleanScorer don't implement this method.
        Hide
        Michael McCandless added a comment -

        Wow, this looks awesome! Nice that you cut over all tests to use newSearcher ... does anything fail?

        Why do you need to track the instances with the WeakHashMap?

        Show
        Michael McCandless added a comment - Wow, this looks awesome! Nice that you cut over all tests to use newSearcher ... does anything fail? Why do you need to track the instances with the WeakHashMap?
        Adrien Grand made changes -
        Field Original Value New Value
        Attachment LUCENE-4903.patch [ 12577228 ]
        Hide
        Adrien Grand added a comment -

        Patch

        • checks for in-order scoring when applicable
        • checks score values (not INFINITY or NaN)
        • checks that Scorer.score() is not called before iteration started or after it finished
        • reuses assertions of DocsEnum on Scorer
        • makes sure that nextDoc() and advance(target) are not called directly on "top scorers" (only from score(Collector)).
        • makes more tests use LuceneTestCase.newSearcher (most of the patch size)
        Show
        Adrien Grand added a comment - Patch checks for in-order scoring when applicable checks score values (not INFINITY or NaN) checks that Scorer.score() is not called before iteration started or after it finished reuses assertions of DocsEnum on Scorer makes sure that nextDoc() and advance(target) are not called directly on "top scorers" (only from score(Collector)). makes more tests use LuceneTestCase.newSearcher (most of the patch size)
        Hide
        Michael McCandless added a comment -

        How can I check for that?

        Well, hopefully we have an AssertingWeight, and then when its scorer() method is returned, it calls .scoresDocsOOO on the wrapped weight, and if that returns true, then the returned AssertingScorer should ensure every returned docID is > the previous one, I think?

        Show
        Michael McCandless added a comment - How can I check for that? Well, hopefully we have an AssertingWeight, and then when its scorer() method is returned, it calls .scoresDocsOOO on the wrapped weight, and if that returns true, then the returned AssertingScorer should ensure every returned docID is > the previous one, I think?
        Hide
        Adrien Grand added a comment -

        It should also confirm that Weight.scoresDocsOutOfOrder didn't lie!

        How can I check for that?

        Show
        Adrien Grand added a comment - It should also confirm that Weight.scoresDocsOutOfOrder didn't lie! How can I check for that?
        Hide
        Robert Muir added a comment -

        and score() etc shouldnt be called after these things return NO_MORE_DOCS.

        Show
        Robert Muir added a comment - and score() etc shouldnt be called after these things return NO_MORE_DOCS.
        Hide
        Michael McCandless added a comment -

        +1

        It should also confirm that Weight.scoresDocsOutOfOrder didn't lie!

        Show
        Michael McCandless added a comment - +1 It should also confirm that Weight.scoresDocsOutOfOrder didn't lie!
        Adrien Grand created issue -

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development