Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Implemented
-
None
-
None
-
Easy
Description
The descriptive module defines the DoubleStatistic as a composite interface:
public interface DoubleStatistic extends DoubleConsumer, DoubleSupplier
The implementations compute on a stream of double values, and provide factory methods to compute on a double[].
The moment based statistics use a rolling algorithm for numerical stability. The moment based statistics (mean, variance, skewness, kurtosis) perform a double pass computation over array data.
Several of the statistics can be efficiently computed for integer types using different algorithms. For example:
mean = sum(x) / n variance = 1/n * [ sum(x^2) - (sum(x))^2 / n ]
These can operate using a single-pass and are not subject to loss of precision if the sums are accumulated with enough integer bits of precision. For a stream the sums must be able to accept up to 2^63 observations. In the case of an array the maximum number of observations is ~2^31 allowing some optimisation. Sums can avoid using BigInteger with special implementations of signed and unsigned summation arithmetic. The final statistic can be computed using extended floating-point precision (e.g. double-double).
Attachments
Issue Links
- is a child of
-
STATISTICS-71 Implementation of Univariate Statistics
- Closed
- relates to
-
STATISTICS-7 Stream-based Java statistical processing
- Closed
-
STATISTICS-54 [GSoC] Summary statistics API for Java 8 streams
- Closed