Uploaded image for project: 'Commons Statistics'
  1. Commons Statistics
  2. STATISTICS-81

Implement descriptive statistics for integer types

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • descriptive
    • None
    • Easy

    Description

      The descriptive module defines the DoubleStatistic as a composite interface:

      public interface DoubleStatistic extends DoubleConsumer, DoubleSupplier

      The implementations compute on a stream of double values, and provide factory methods to compute on a double[].

      The moment based statistics use a rolling algorithm for numerical stability. The moment based statistics (mean, variance, skewness, kurtosis) perform a double pass computation over array data.

      Several of the statistics can be efficiently computed for integer types using different algorithms. For example:

      mean = sum(x) / n
      variance = 1/n * [ sum(x^2) - (sum(x))^2 / n ]
      

      These can operate using a single-pass and are not subject to loss of precision if the sums are accumulated with enough integer bits of precision. For a stream the sums must be able to accept up to 2^63 observations. In the case of an array the maximum number of observations is ~2^31 allowing some optimisation. Sums can avoid using BigInteger with special implementations of signed and unsigned summation arithmetic. The final statistic can be computed using extended floating-point precision (e.g. double-double).

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aherbert Alex Herbert
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: