Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8420

Date statistics: sumOfSquares overflows long

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 5.4
    • Fix Version/s: 5.5.1, 6.0
    • Labels:
      None
    • Flags:
      Patch

      Description

      The values for Dates are large enough that squaring them overflows a "long" field. This should be converted to a double.

      StatsValuesFactory.java, line 755 DateStatsValues#updateTypeSpecificStats Add a cast to double

      sumOfSquares += ( (double)value * value * count);

      1. 0001-Fix-overflow-in-date-statistics.patch
        8 kB
        Tom Hill
      2. 0001-Fix-overflow-in-date-statistics.patch
        4 kB
        Tom Hill
      3. 0001-Fix-overflow-in-date-statistics.patch
        4 kB
        Tom Hill
      4. SOLR-8420.patch
        8 kB
        Tomás Fernández Löbbe
      5. StdDev.java
        1 kB
        Tom Hill

        Activity

        Hide
        tomsolr Tom Hill added a comment -

        Fixes overflow in stddev, too.

        Not ready to commit. I still have to fix a rounding error in TestDistributed.

        Show
        tomsolr Tom Hill added a comment - Fixes overflow in stddev, too. Not ready to commit. I still have to fix a rounding error in TestDistributed.
        Hide
        tomsolr Tom Hill added a comment -

        Just a quick demo of why TestDistributedSearch is failing, when running with the patch.

        When TestDistributedSearch#test is run with two partitions, it gets a slightly different value than when run on one partition.

        The two results are
        100010100011010110010111011100111101011001010000011100001000110
        100010100011010110010111011100111101011001010000011100001000101

        This matches the numbers seen in TestDistributedSearch.

        It looks like we need to add some delta into the compare for doubles in

        BaseDistributedSearchTestCase#public static String compare(Object a, Object b, int flags, Map<String, Integer> handle)

        Show
        tomsolr Tom Hill added a comment - Just a quick demo of why TestDistributedSearch is failing, when running with the patch. When TestDistributedSearch#test is run with two partitions, it gets a slightly different value than when run on one partition. The two results are 100010100011010110010111011100111101011001010000011100001000110 100010100011010110010111011100111101011001010000011100001000101 This matches the numbers seen in TestDistributedSearch. It looks like we need to add some delta into the compare for doubles in BaseDistributedSearchTestCase#public static String compare(Object a, Object b, int flags, Map<String, Integer> handle)
        Hide
        cpoerschke Christine Poerschke added a comment -

        sumOfSquares overflow, interesting. I wonder if DateStatsValues's private long sum = 0; might perhaps become double also (similar to NumericStatsValues's sum being a double already).

        Show
        cpoerschke Christine Poerschke added a comment - sumOfSquares overflow, interesting. I wonder if DateStatsValues 's private long sum = 0; might perhaps become double also (similar to NumericStatsValues 's sum being a double already).
        Hide
        tomsolr Tom Hill added a comment -

        Certainly could be changed. Looks like it currently would overflow if you are looking at more than 6 million+ dates, which is a pretty small number today.

        Downside is small loss of precision for smaller datasets.

        I think it probably should be changed. I'll update the patch.

        Show
        tomsolr Tom Hill added a comment - Certainly could be changed. Looks like it currently would overflow if you are looking at more than 6 million+ dates, which is a pretty small number today. Downside is small loss of precision for smaller datasets. I think it probably should be changed. I'll update the patch.
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        I wonder if DateStatsValues's private long sum = 0; might perhaps become double also (similar to NumericStatsValues's sum being a double already).

        Yes, I think it should. This should also change the output, currently it's a date like:

        <date name="sum">122366-06-12T21:06:06Z</date>
        
        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - I wonder if DateStatsValues's private long sum = 0; might perhaps become double also (similar to NumericStatsValues's sum being a double already). Yes, I think it should. This should also change the output, currently it's a date like: <date name= "sum" > 122366-06-12T21:06:06Z </date>
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        I created SOLR-8671 for this particular change

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - I created SOLR-8671 for this particular change
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        While looking at this patch I noticed that in the line 841 of TestDistributedSearch it says:

        rsp = query("q", "*:*", "rows", "0", "stats", "true",
        

        but was intended to be

        rsp = query("q", q, "rows", "0", "stats", "true",
        

        We should fix that as part of this Jira too.

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - While looking at this patch I noticed that in the line 841 of TestDistributedSearch it says: rsp = query( "q" , "*:*" , "rows" , "0" , "stats" , " true " , but was intended to be rsp = query( "q" , q, "rows" , "0" , "stats" , " true " , We should fix that as part of this Jira too.
        Hide
        tomsolr Tom Hill added a comment -

        This latest version of the path adds an allowance in tests for floating point errors in computations for specific stats.

        It also fixes the error in the test that Tomas noted.

        Show
        tomsolr Tom Hill added a comment - This latest version of the path adds an allowance in tests for floating point errors in computations for specific stats. It also fixes the error in the test that Tomas noted.
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Thanks Tom Hill, patch looks good to me, will commit shortly

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Thanks Tom Hill , patch looks good to me, will commit shortly
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Minor change, FUZZY comparison passes in case of both numbers being NaN, made the accepted ratio a constant and added some javadocs to FUZZY flag

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Minor change, FUZZY comparison passes in case of both numbers being NaN, made the accepted ratio a constant and added some javadocs to FUZZY flag
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 730d10f145378b164a93d63b82a02dcf7f2fdf14 in lucene-solr's branch refs/heads/master from Tomas Fernandez Lobbe
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=730d10f ]

        SOLR-8420: Fix long overflow in sumOfSquares for Date statistics

        Casted operations to double. Changed the test to support a percentage error given the FUZZY flag in doubles

        Show
        jira-bot ASF subversion and git services added a comment - Commit 730d10f145378b164a93d63b82a02dcf7f2fdf14 in lucene-solr's branch refs/heads/master from Tomas Fernandez Lobbe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=730d10f ] SOLR-8420 : Fix long overflow in sumOfSquares for Date statistics Casted operations to double. Changed the test to support a percentage error given the FUZZY flag in doubles
        Hide
        anshumg Anshum Gupta added a comment -

        Reopening to back port for 5.5.1

        Show
        anshumg Anshum Gupta added a comment - Reopening to back port for 5.5.1
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 2beccf469f9e07eb5a05fef9ec3f869d6da4008a in lucene-solr's branch refs/heads/branch_5x from Tomas Fernandez Lobbe
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2beccf4 ]

        SOLR-8420: Fix long overflow in sumOfSquares for Date statistics

        Casted operations to double. Changed the test to support a percentage error given the FUZZY flag in doubles

        Show
        jira-bot ASF subversion and git services added a comment - Commit 2beccf469f9e07eb5a05fef9ec3f869d6da4008a in lucene-solr's branch refs/heads/branch_5x from Tomas Fernandez Lobbe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2beccf4 ] SOLR-8420 : Fix long overflow in sumOfSquares for Date statistics Casted operations to double. Changed the test to support a percentage error given the FUZZY flag in doubles
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit f9acafbd917b7970b29f12e0c637612d2cd216f7 in lucene-solr's branch refs/heads/branch_5_5 from Tomas Fernandez Lobbe
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f9acafb ]

        SOLR-8420: Fix long overflow in sumOfSquares for Date statistics

        Casted operations to double. Changed the test to support a percentage error given the FUZZY flag in doubles

        Show
        jira-bot ASF subversion and git services added a comment - Commit f9acafbd917b7970b29f12e0c637612d2cd216f7 in lucene-solr's branch refs/heads/branch_5_5 from Tomas Fernandez Lobbe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f9acafb ] SOLR-8420 : Fix long overflow in sumOfSquares for Date statistics Casted operations to double. Changed the test to support a percentage error given the FUZZY flag in doubles

          People

          • Assignee:
            tomasflobbe Tomás Fernández Löbbe
            Reporter:
            tomsolr Tom Hill
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development