Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18426

Improve the accuracy of MutableStat mean

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      The current MutableStat mean calculation method is more prone to loss accuracy because the sum of samples is too large.
      Storing large integers in the double type results in a loss of accuracy. For example, 9223372036854775707 and 9223372036854775708 are both stored as doubles as 9223372036854776000. Therefore, we should try to avoid using the cumulative total sum method to calculate the average, but update the average every time we sample. All in all, we can process each sample on its own to improve mean accuracy.

      Attachments

        Activity

          People

            zhangshuyan Shuyan Zhang
            zhangshuyan Shuyan Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: