Accumulo
  1. Accumulo
  2. ACCUMULO-2494

Stat calculation of STDEV may be inaccurate

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.2, 1.6.0
    • Component/s: client
    • Labels:
      None

      Description

      The math is sound, but it is susceptible to rounding errors. We should address that.

      See http://www.strchr.com/standard_deviation_in_one_pass and http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/variance.pdf

        Issue Links

          Activity

          Hide
          ASF subversion and git services added a comment -

          Commit 8f5728084901bcc0b0ca7e4e74f3a8ec18b0d8d2 in accumulo's branch refs/heads/master from Mike Drob
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=8f57280 ]

          ACCUMULO-2494 add commons-math to 1.6 tarball

          Show
          ASF subversion and git services added a comment - Commit 8f5728084901bcc0b0ca7e4e74f3a8ec18b0d8d2 in accumulo's branch refs/heads/master from Mike Drob [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=8f57280 ] ACCUMULO-2494 add commons-math to 1.6 tarball
          Hide
          ASF subversion and git services added a comment -

          Commit 8f5728084901bcc0b0ca7e4e74f3a8ec18b0d8d2 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Mike Drob
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=8f57280 ]

          ACCUMULO-2494 add commons-math to 1.6 tarball

          Show
          ASF subversion and git services added a comment - Commit 8f5728084901bcc0b0ca7e4e74f3a8ec18b0d8d2 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Mike Drob [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=8f57280 ] ACCUMULO-2494 add commons-math to 1.6 tarball
          Hide
          Josh Elser added a comment -

          Mike Drob, ahh, gotcha. Thanks for taking care of this!

          Show
          Josh Elser added a comment - Mike Drob , ahh, gotcha. Thanks for taking care of this!
          Hide
          ASF subversion and git services added a comment -

          Commit 00355d0e02f3877558d9668bb75db6d703e860ef in accumulo's branch refs/heads/master from Mike Drob
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=00355d0 ]

          ACCUMULO-2494 add commons-math to 1.5

          Show
          ASF subversion and git services added a comment - Commit 00355d0e02f3877558d9668bb75db6d703e860ef in accumulo's branch refs/heads/master from Mike Drob [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=00355d0 ] ACCUMULO-2494 add commons-math to 1.5
          Mike Drob made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Mike Drob made changes -
          Link This issue incorporates ACCUMULO-2546 [ ACCUMULO-2546 ]
          Hide
          ASF subversion and git services added a comment -

          Commit 00355d0e02f3877558d9668bb75db6d703e860ef in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Mike Drob
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=00355d0 ]

          ACCUMULO-2494 add commons-math to 1.5

          Show
          ASF subversion and git services added a comment - Commit 00355d0e02f3877558d9668bb75db6d703e860ef in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Mike Drob [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=00355d0 ] ACCUMULO-2494 add commons-math to 1.5
          Hide
          ASF subversion and git services added a comment -

          Commit 00355d0e02f3877558d9668bb75db6d703e860ef in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Mike Drob
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=00355d0 ]

          ACCUMULO-2494 add commons-math to 1.5

          Show
          ASF subversion and git services added a comment - Commit 00355d0e02f3877558d9668bb75db6d703e860ef in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Mike Drob [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=00355d0 ] ACCUMULO-2494 add commons-math to 1.5
          Hide
          Mike Drob added a comment -

          Josh Elser - No, my next question was going to be how this ever worked, since I thought we already had a dependency on commons-math and that's why this change was safe to include. However, we only had it in the parent dependency management section, and an explicit dependency under test, so I'm not sure how this even compiled on my machine. Probably a transitive dependency somewhere...

          Looks like Eric Newton beat me to adding it to the pom, but I'll verify the tarball and generated RPMs/DEBs as well.

          Show
          Mike Drob added a comment - Josh Elser - No, my next question was going to be how this ever worked, since I thought we already had a dependency on commons-math and that's why this change was safe to include. However, we only had it in the parent dependency management section, and an explicit dependency under test, so I'm not sure how this even compiled on my machine. Probably a transitive dependency somewhere... Looks like Eric Newton beat me to adding it to the pom, but I'll verify the tarball and generated RPMs/DEBs as well.
          Hide
          ASF subversion and git services added a comment -

          Commit 9f6cc39015a7533837f35f9af5f4666d52b3c6cb in accumulo's branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=9f6cc39 ]

          ACCUMULO-2494 add dependency on commons-math, in sorted order

          Show
          ASF subversion and git services added a comment - Commit 9f6cc39015a7533837f35f9af5f4666d52b3c6cb in accumulo's branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=9f6cc39 ] ACCUMULO-2494 add dependency on commons-math, in sorted order
          Hide
          ASF subversion and git services added a comment -

          Commit 9f6cc39015a7533837f35f9af5f4666d52b3c6cb in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=9f6cc39 ]

          ACCUMULO-2494 add dependency on commons-math, in sorted order

          Show
          ASF subversion and git services added a comment - Commit 9f6cc39015a7533837f35f9af5f4666d52b3c6cb in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=9f6cc39 ] ACCUMULO-2494 add dependency on commons-math, in sorted order
          Hide
          ASF subversion and git services added a comment -

          Commit 4bf28d99f21b6620ffd063fd82499fb2193cb508 in accumulo's branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=4bf28d9 ]

          ACCUMULO-2494 add dependency on commons-math

          Show
          ASF subversion and git services added a comment - Commit 4bf28d99f21b6620ffd063fd82499fb2193cb508 in accumulo's branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=4bf28d9 ] ACCUMULO-2494 add dependency on commons-math
          Hide
          ASF subversion and git services added a comment -

          Commit 4bf28d99f21b6620ffd063fd82499fb2193cb508 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=4bf28d9 ]

          ACCUMULO-2494 add dependency on commons-math

          Show
          ASF subversion and git services added a comment - Commit 4bf28d99f21b6620ffd063fd82499fb2193cb508 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=4bf28d9 ] ACCUMULO-2494 add dependency on commons-math
          Hide
          Josh Elser added a comment -

          Which version of Hadoop did you use?

          Apache Hadoop 2.3.0. Looks like they bundle a commons-math3 (which contains a org/apache/commons/math3/stat/descriptive/rank/Min but not org/apache/commons/math/stat/descriptive/rank/Min). Taking a step to assume that you're asking because you expected it to be provided by Hadoop.. haven't we moved away from this due to the lessons learned from 1.5.0 (the original emphasis that since we're using it directly, we should provide it directly)?

          Show
          Josh Elser added a comment - Which version of Hadoop did you use? Apache Hadoop 2.3.0. Looks like they bundle a commons-math3 (which contains a org/apache/commons/math3/stat/descriptive/rank/Min but not org/apache/commons/math/stat/descriptive/rank/Min ). Taking a step to assume that you're asking because you expected it to be provided by Hadoop.. haven't we moved away from this due to the lessons learned from 1.5.0 (the original emphasis that since we're using it directly, we should provide it directly)?
          Hide
          Mike Drob added a comment -

          Which version of Hadoop did you use?

          Show
          Mike Drob added a comment - Which version of Hadoop did you use?
          Hide
          Josh Elser added a comment -

          Can we do the packaging work in a follow-on issue?

          You broke installations in this ticket. I would much rather it be fixed here as it's not follow-on work.

          Show
          Josh Elser added a comment - Can we do the packaging work in a follow-on issue? You broke installations in this ticket. I would much rather it be fixed here as it's not follow-on work.
          Hide
          Mike Drob added a comment -

          Can we do the packaging work in a follow-on issue?

          Show
          Mike Drob added a comment - Can we do the packaging work in a follow-on issue?
          Josh Elser made changes -
          Assignee Mike Drob [ mdrob ]
          Josh Elser made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Josh Elser added a comment -

          Just deployed a fresh copy of 1.6.0 and got a failure because commons-math was not in the tarball. assembly/src/main/asssemblies/component.xml needs to be updated to include this direct dependency to the tarball.

          RPM and DEB should also be checked to ensure the dependency is present there.

          Show
          Josh Elser added a comment - Just deployed a fresh copy of 1.6.0 and got a failure because commons-math was not in the tarball. assembly/src/main/asssemblies/component.xml needs to be updated to include this direct dependency to the tarball. RPM and DEB should also be checked to ensure the dependency is present there.
          Hide
          ASF subversion and git services added a comment -

          Commit ce6a7216d3d41966f15c3be2d2099aac1797124f in accumulo's branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ce6a721 ]

          ACCUMULO-2494 fix rat check failure

          Show
          ASF subversion and git services added a comment - Commit ce6a7216d3d41966f15c3be2d2099aac1797124f in accumulo's branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ce6a721 ] ACCUMULO-2494 fix rat check failure
          Hide
          Mike Drob added a comment -

          cherry-picked Eric Newton's rat fix back to 1.5.2.

          Show
          Mike Drob added a comment - cherry-picked Eric Newton 's rat fix back to 1.5.2.
          Hide
          ASF subversion and git services added a comment -

          Commit ce6a7216d3d41966f15c3be2d2099aac1797124f in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ce6a721 ]

          ACCUMULO-2494 fix rat check failure

          Show
          ASF subversion and git services added a comment - Commit ce6a7216d3d41966f15c3be2d2099aac1797124f in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ce6a721 ] ACCUMULO-2494 fix rat check failure
          Hide
          ASF subversion and git services added a comment -

          Commit ce6a7216d3d41966f15c3be2d2099aac1797124f in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ce6a721 ]

          ACCUMULO-2494 fix rat check failure

          Show
          ASF subversion and git services added a comment - Commit ce6a7216d3d41966f15c3be2d2099aac1797124f in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ce6a721 ] ACCUMULO-2494 fix rat check failure
          Hide
          ASF subversion and git services added a comment -

          Commit c6163b4c4554996c928bd4be353113f6fc806493 in accumulo's branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=c6163b4 ]

          ACCUMULO-2494 fix rat check failure

          Show
          ASF subversion and git services added a comment - Commit c6163b4c4554996c928bd4be353113f6fc806493 in accumulo's branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=c6163b4 ] ACCUMULO-2494 fix rat check failure
          Hide
          ASF subversion and git services added a comment -

          Commit c6163b4c4554996c928bd4be353113f6fc806493 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=c6163b4 ]

          ACCUMULO-2494 fix rat check failure

          Show
          ASF subversion and git services added a comment - Commit c6163b4c4554996c928bd4be353113f6fc806493 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=c6163b4 ] ACCUMULO-2494 fix rat check failure
          Mike Drob made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Mike Drob made changes -
          Fix Version/s 1.5.2 [ 12326272 ]
          Fix Version/s 1.6.0 [ 12322468 ]
          Hide
          ASF subversion and git services added a comment -

          Commit 0dc92ca1288f521cbbeeb8bbd266fa922c845d90 in accumulo's branch refs/heads/master from Mike Drob
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0dc92ca ]

          ACCUMULO-2494 Delegate math to commons-math

          Show
          ASF subversion and git services added a comment - Commit 0dc92ca1288f521cbbeeb8bbd266fa922c845d90 in accumulo's branch refs/heads/master from Mike Drob [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0dc92ca ] ACCUMULO-2494 Delegate math to commons-math
          Hide
          ASF subversion and git services added a comment -

          Commit 0dc92ca1288f521cbbeeb8bbd266fa922c845d90 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Mike Drob
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0dc92ca ]

          ACCUMULO-2494 Delegate math to commons-math

          Show
          ASF subversion and git services added a comment - Commit 0dc92ca1288f521cbbeeb8bbd266fa922c845d90 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Mike Drob [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0dc92ca ] ACCUMULO-2494 Delegate math to commons-math
          Hide
          ASF subversion and git services added a comment -

          Commit 0dc92ca1288f521cbbeeb8bbd266fa922c845d90 in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Mike Drob
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0dc92ca ]

          ACCUMULO-2494 Delegate math to commons-math

          Show
          ASF subversion and git services added a comment - Commit 0dc92ca1288f521cbbeeb8bbd266fa922c845d90 in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Mike Drob [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0dc92ca ] ACCUMULO-2494 Delegate math to commons-math
          Hide
          Keith Turner added a comment -

          -0 I am not strongly opposed to fixing this particular issue in 1.5.X. I am concerned about making lots of changes of this nature. I think doing this increases the chances of introducing a new ciritcal bug in 1.5.X lines. I am wondering when you draw the line. If I had 20 issues like this in front of me all at once I would probably make a different decisions than looking at those same issues 1 per week over 20 weeks.

          Show
          Keith Turner added a comment - -0 I am not strongly opposed to fixing this particular issue in 1.5.X. I am concerned about making lots of changes of this nature. I think doing this increases the chances of introducing a new ciritcal bug in 1.5.X lines. I am wondering when you draw the line. If I had 20 issues like this in front of me all at once I would probably make a different decisions than looking at those same issues 1 per week over 20 weeks.
          Hide
          Mike Drob added a comment -

          I should have been more clear.

          I have seen multiple times in the past where small seemingly innocuous changes for minor bugs have introduced critical bugs. In this case TabletServer uses the Stat class, but does not use the std deviation. The risk is a small possibility of introducing a new critical bug in tserver if the change breaks Stat in some strange new way. The benefit of the change is that informational output from a few test may be better.

          Based on this comment, I do not understand which version you would support including the new implementation of Stat in. I also do not understand if you would support fixing stdev in older versions while leaving everything else the same. FWIW, my implementation and the commons-math implementation were the same (theirs was a bit more general and better documented, but otherwise identical).

          The reason I asked for clarification is that I cannot tell if you are -1 or -0 on including the full change in 1.5.x.

          Something else that would cause problems would be if the Apache library stored all of the data you were computing stats for. I assume it does not do this, but would have to inspect the code to be sure.

          It does not store the data, so this is not an issue.

          Show
          Mike Drob added a comment - I should have been more clear. I have seen multiple times in the past where small seemingly innocuous changes for minor bugs have introduced critical bugs. In this case TabletServer uses the Stat class, but does not use the std deviation. The risk is a small possibility of introducing a new critical bug in tserver if the change breaks Stat in some strange new way. The benefit of the change is that informational output from a few test may be better. Based on this comment, I do not understand which version you would support including the new implementation of Stat in. I also do not understand if you would support fixing stdev in older versions while leaving everything else the same. FWIW, my implementation and the commons-math implementation were the same (theirs was a bit more general and better documented, but otherwise identical). The reason I asked for clarification is that I cannot tell if you are -1 or -0 on including the full change in 1.5.x. Something else that would cause problems would be if the Apache library stored all of the data you were computing stats for. I assume it does not do this, but would have to inspect the code to be sure. It does not store the data, so this is not an issue.
          Hide
          Keith Turner added a comment -

          I expressed my concerns on RB

          Show
          Keith Turner added a comment - I expressed my concerns on RB
          Keith Turner made changes -
          Field Original Value New Value
          Remote Link This issue links to "Review (Web Link)" [ 14623 ]
          Hide
          Sean Busbey added a comment -

          Yes, javadoc was my intention.

          Show
          Sean Busbey added a comment - Yes, javadoc was my intention.
          Hide
          Mike Drob added a comment -

          re: fix version

          The latest version of the patch delegates all calculations in Stat to commons-math. However, 1.4 does not yet have a dependency on commons-math so I am hesitant to introduce one. Sean Busbey suggested on RB that we apply a known issue note to 1.4 (presumably javadoc?) and then apply the code change in 1.5 and newer.

          Keith Turner - You expressed some concern previously, can you articulate it further?

          Show
          Mike Drob added a comment - re: fix version The latest version of the patch delegates all calculations in Stat to commons-math. However, 1.4 does not yet have a dependency on commons-math so I am hesitant to introduce one. Sean Busbey suggested on RB that we apply a known issue note to 1.4 (presumably javadoc?) and then apply the code change in 1.5 and newer. Keith Turner - You expressed some concern previously, can you articulate it further?
          Hide
          Keith Turner added a comment -

          This issue is about org.apache.accumulo.core.util.Stat

          Show
          Keith Turner added a comment - This issue is about org.apache.accumulo.core.util.Stat
          Mike Drob created issue -

            People

            • Assignee:
              Mike Drob
              Reporter:
              Mike Drob
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development