Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Collectors have the ability to declare whether or not they support out-of-order collection, but since most scorers score in order this is not well tested.

      1. LUCENE-5501.patch
        6 kB
        Adrien Grand
      2. LUCENE-5501-2.patch
        15 kB
        Adrien Grand
      3. LUCENE-5501-2.patch
        13 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a simple patch I've been playing with:

        • AssertingWeight.scoresDocsOutOfOrder randomly returns true in order to trigger the use of our top docs collectors that tie-break on doc id,
        • AssertingScorer randomly scores in random order when the collector says it supports it

        It found a bug in the grouping collector whose acceptDocsOutOfOrder method returns true although the collect method has a comment that explicitely says that the comparison works because doc IDs come in order.

        Show
        Adrien Grand added a comment - Here is a simple patch I've been playing with: AssertingWeight.scoresDocsOutOfOrder randomly returns true in order to trigger the use of our top docs collectors that tie-break on doc id, AssertingScorer randomly scores in random order when the collector says it supports it It found a bug in the grouping collector whose acceptDocsOutOfOrder method returns true although the collect method has a comment that explicitely says that the comparison works because doc IDs come in order.
        Hide
        Robert Muir added a comment -

        Are you sure? I think its ok overall, but of course could be better:

        From a line coverage perspective, most of these are at or very close to 100%

        Looking at test contributions against each, all the collectors in TopScore* look like they get beat up pretty well.
        TopField* is not so great, but with several tests trying to explicitly iterate over all of them (TestSearchAfter, TestExpressionSorts, etc). More tests for these sorting ones might be good, but I don't think the situation is so bad?

        https://builds.apache.org/job/Lucene-Solr-Clover-trunk/clover/org/apache/lucene/search/TopScoreDocCollector.html#TopScoreDocCollector
        https://builds.apache.org/job/Lucene-Solr-Clover-trunk/clover/org/apache/lucene/search/TopFieldCollector.html#TopFieldCollector

        Show
        Robert Muir added a comment - Are you sure? I think its ok overall, but of course could be better: From a line coverage perspective, most of these are at or very close to 100% Looking at test contributions against each, all the collectors in TopScore* look like they get beat up pretty well. TopField* is not so great, but with several tests trying to explicitly iterate over all of them (TestSearchAfter, TestExpressionSorts, etc). More tests for these sorting ones might be good, but I don't think the situation is so bad? https://builds.apache.org/job/Lucene-Solr-Clover-trunk/clover/org/apache/lucene/search/TopScoreDocCollector.html#TopScoreDocCollector https://builds.apache.org/job/Lucene-Solr-Clover-trunk/clover/org/apache/lucene/search/TopFieldCollector.html#TopFieldCollector
        Hide
        Adrien Grand added a comment -

        I was not thinking about these collectors at all, I think they are very-well tested indeed! I was more thinking about more exotic collectors, like those that are used for grouping or joins that won't get out-of-order testing unless they use a boolean query.

        Show
        Adrien Grand added a comment - I was not thinking about these collectors at all, I think they are very-well tested indeed! I was more thinking about more exotic collectors, like those that are used for grouping or joins that won't get out-of-order testing unless they use a boolean query.
        Hide
        Michael McCandless added a comment -

        +1, I love this patch. You shuffle the docIDs from the scorer before delivering to the collector, if the collector claims it can accept out-of-order hits.

        LUCENE-4950 is related: I tried to fix AssertingIndexSearcher to use AssertingCollector but hit strange exceptions with ConstantScoreQuery that I never explained. AssertingCollector would verify that if the collector said it could not accept docs out of order, then the scorer does not in fact deliver docs out of order.

        Also, LUCENE-5487 will increase how often out-of-order scoring is "allowed", because BooleanScorer will now allow the sub-scorers to score out of order.

        Show
        Michael McCandless added a comment - +1, I love this patch. You shuffle the docIDs from the scorer before delivering to the collector, if the collector claims it can accept out-of-order hits. LUCENE-4950 is related: I tried to fix AssertingIndexSearcher to use AssertingCollector but hit strange exceptions with ConstantScoreQuery that I never explained. AssertingCollector would verify that if the collector said it could not accept docs out of order, then the scorer does not in fact deliver docs out of order. Also, LUCENE-5487 will increase how often out-of-order scoring is "allowed", because BooleanScorer will now allow the sub-scorers to score out of order.
        Hide
        ASF subversion and git services added a comment -

        Commit 1575873 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1575873 ]

        LUCENE-5501: AssertingScorer now randomly collects documents in random order when the collector says it supports it.

        Show
        ASF subversion and git services added a comment - Commit 1575873 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1575873 ] LUCENE-5501 : AssertingScorer now randomly collects documents in random order when the collector says it supports it.
        Hide
        ASF subversion and git services added a comment -

        Commit 1575874 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1575874 ]

        LUCENE-5501: AssertingScorer now randomly collects documents in random order when the collector says it supports it.

        Show
        ASF subversion and git services added a comment - Commit 1575874 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1575874 ] LUCENE-5501 : AssertingScorer now randomly collects documents in random order when the collector says it supports it.
        Hide
        Adrien Grand added a comment -

        Thanks for reviewing, Mike!

        Show
        Adrien Grand added a comment - Thanks for reviewing, Mike!
        Hide
        Adrien Grand added a comment -

        Thinking about it again, I realized it would be possible to make it better by wrapping the collector. This way out-of-order scoring can be tested against a larger variety of scorers such as ConstantQuery's scorer (which is currently untested because it overrides score(Collector)).

        Show
        Adrien Grand added a comment - Thinking about it again, I realized it would be possible to make it better by wrapping the collector. This way out-of-order scoring can be tested against a larger variety of scorers such as ConstantQuery's scorer (which is currently untested because it overrides score(Collector) ).
        Hide
        Adrien Grand added a comment -

        Updated patch following the BulkScorer refactoring.

        Show
        Adrien Grand added a comment - Updated patch following the BulkScorer refactoring.
        Hide
        ASF subversion and git services added a comment -

        Commit 1584829 from jpountz@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1584829 ]

        LUCENE-5501: Improved out-of-order collection testing.

        Show
        ASF subversion and git services added a comment - Commit 1584829 from jpountz@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1584829 ] LUCENE-5501 : Improved out-of-order collection testing.
        Hide
        ASF subversion and git services added a comment -

        Commit 1584844 from jpountz@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1584844 ]

        LUCENE-5501: Improved out-of-order collection testing.

        Show
        ASF subversion and git services added a comment - Commit 1584844 from jpountz@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1584844 ] LUCENE-5501 : Improved out-of-order collection testing.
        Hide
        Uwe Schindler added a comment -

        Close issue after release of 4.8.0

        Show
        Uwe Schindler added a comment - Close issue after release of 4.8.0

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development