Solr
  1. Solr
  2. SOLR-7818

Distributed stats only calculates with the terms that are present in the last shard of the distributed request

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0, 5.1, 5.2, 5.2.1
    • Fix Version/s: 5.3
    • Component/s: None
    • Labels:
      None

      Description

      In ExactStatsCache#mergeToGlobalStats we go through the n responses and merge the termStats and colStats. But we keep putting the terms into the map which means only the last shard response terms will get used.

      This can lead to some terms not ending up calculating the distributed idf scores as the last shard might not have contained the term while the other shards actually had those terms.

      1. SOLR-7818.patch
        5 kB
        Varun Thacker
      2. SOLR-7818.patch
        3 kB
        Varun Thacker

        Activity

        Hide
        Varun Thacker added a comment -

        Patch where the terms is calculated and added at the merge stage instead of the individual shards sending them and the merge stage merging them

        Show
        Varun Thacker added a comment - Patch where the terms is calculated and added at the merge stage instead of the individual shards sending them and the merge stage merging them
        Hide
        Anshum Gupta added a comment -

        We should merge the terms from shard responses in a set and add that outside of the loop rather than calling createNormalizedWeight and extractTerms. This approach could potentially return only local terms, depending upon the query parser.

        Also having a test would be nice so we don't regress.

        Show
        Anshum Gupta added a comment - We should merge the terms from shard responses in a set and add that outside of the loop rather than calling createNormalizedWeight and extractTerms . This approach could potentially return only local terms, depending upon the query parser. Also having a test would be nice so we don't regress.
        Hide
        Varun Thacker added a comment -

        Thanks Anshum for your feedback.

        Yeah the earlier method wouldn't have worked for query parsers like MLTQueryParser etc.

        Regarding a test case I've added TestDistributedIDF#testMultiCollectionQuery . This test exposes this problem.

        Show
        Varun Thacker added a comment - Thanks Anshum for your feedback. Yeah the earlier method wouldn't have worked for query parsers like MLTQueryParser etc. Regarding a test case I've added TestDistributedIDF#testMultiCollectionQuery . This test exposes this problem.
        Hide
        ASF subversion and git services added a comment -

        Commit 1694210 from Varun Thacker in branch 'dev/trunk'
        [ https://svn.apache.org/r1694210 ]

        SOLR-7818: Distributed stats is only calculated with the terms that are present in the last shard of a distributed request

        Show
        ASF subversion and git services added a comment - Commit 1694210 from Varun Thacker in branch 'dev/trunk' [ https://svn.apache.org/r1694210 ] SOLR-7818 : Distributed stats is only calculated with the terms that are present in the last shard of a distributed request
        Hide
        ASF subversion and git services added a comment -

        Commit 1694211 from Varun Thacker in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1694211 ]

        SOLR-7818: Distributed stats is only calculated with the terms that are present in the last shard of a distributed request (merged from trunk r1694210)

        Show
        ASF subversion and git services added a comment - Commit 1694211 from Varun Thacker in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694211 ] SOLR-7818 : Distributed stats is only calculated with the terms that are present in the last shard of a distributed request (merged from trunk r1694210)
        Hide
        ASF subversion and git services added a comment -

        Commit 1694213 from Varun Thacker in branch 'dev/trunk'
        [ https://svn.apache.org/r1694213 ]

        SOLR-7818 SOLR-7756 Added better descriptions in the CHANGES entry for these two issues

        Show
        ASF subversion and git services added a comment - Commit 1694213 from Varun Thacker in branch 'dev/trunk' [ https://svn.apache.org/r1694213 ] SOLR-7818 SOLR-7756 Added better descriptions in the CHANGES entry for these two issues
        Hide
        ASF subversion and git services added a comment -

        Commit 1694214 from Varun Thacker in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1694214 ]

        SOLR-7818 SOLR-7756 Added better descriptions in the CHANGES entry for these two issues (merged from trunk r1694213)

        Show
        ASF subversion and git services added a comment - Commit 1694214 from Varun Thacker in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694214 ] SOLR-7818 SOLR-7756 Added better descriptions in the CHANGES entry for these two issues (merged from trunk r1694213)
        Hide
        Varun Thacker added a comment -

        Thanks Anshum for the review.

        Show
        Varun Thacker added a comment - Thanks Anshum for the review.
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Varun Thacker
            Reporter:
            Varun Thacker
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development