Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18426

Improve the accuracy of MutableStat mean

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      The current MutableStat mean calculation method is more prone to loss accuracy because the sum of samples is too large.
      Storing large integers in the double type results in a loss of accuracy. For example, 9223372036854775707 and 9223372036854775708 are both stored as doubles as 9223372036854776000. Therefore, we should try to avoid using the cumulative total sum method to calculate the average, but update the average every time we sample. All in all, we can process each sample on its own to improve mean accuracy.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zhangshuyan Shuyan Zhang Assign to me
            zhangshuyan Shuyan Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment