Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
Easy
Description
The descriptive module defines the DoubleStatistic as a composite interface:
public interface DoubleStatistic extends DoubleConsumer, DoubleSupplier
The implementations compute on a stream of double values, and provide factory methods to compute on a double[].
The moment based statistics use a rolling algorithm for numerical stability. The moment based statistics (mean, variance, skewness, kurtosis) perform a double pass computation over array data.
Several of the statistics can be efficiently computed for integer types using different algorithms. For example:
mean = sum(x) / n variance = 1/n * [ sum(x^2) - (sum(x))^2 / n ]
These can operate using a single-pass and are not subject to loss of precision if the sums are accumulated with enough integer bits of precision. For a stream the sums must be able to accept up to 2^63 observations. In the case of an array the maximum number of observations is ~2^31 allowing some optimisation. Sums can avoid using BigInteger with special implementations of signed and unsigned summation arithmetic. The final statistic can be computed using extended floating-point precision (e.g. double-double).
Attachments
Issue Links
- is a child of
-
STATISTICS-71 Implementation of Univariate Statistics
- Open