Details

Type: Bug

Status: Closed

Priority: Minor

Resolution: Fixed

Affects Version/s: 1.2

Fix Version/s: 2.0

Labels:None
Description
The smallest pvalue returned by TTestImpl.tTest() is the machine epsilon, which is 2.220446E16 with IEEE754 64bit double precision floats.
We found this bug porting some analysis software from R to java, and noticed that the pvalues did not match up. We believe we've identified why this is happening in commonsmath1.2, and a possible solution.
Please be gentle, as I am not a statistics expert!
The following method in org.apache.commons.math.stat.inference.TTestImpl currently implements the following method to calculate the pvalue for a 2sided, 2sample ttest:
protected double tTest(double m1, double m2, double v1, double v2, double n1, double n2)
and it returns:
1.0  distribution.cumulativeProbability(t, t);
at line 1034 in version 1.2.
double cumulativeProbability(double x0, double x1) is implemented by org.apache.commons.math.distribution.AbstractDisstribution, and returns:
return cumulativeProbability(x1)  cumulativeProbability(x0);
So in essence, the pvalue returned by TTestImpl.tTest() is:
1.0  (cumulativeProbability(t)  cumulativeProbabily(t))
For largeish tstatistics, cumulativeProbabilty(t) can get quite small, and cumulativeProbabilty(t) can get very close to 1.0. When cumulativeProbability(t) is less than the machine epsilon, we get pvalues equal to zero because:
1.0  1.0 + 0.0 = 0.0
An alternative calculation for the pvalue of a 2sided, 2sample ttest is:
p = 2.0 * cumulativeProbability(t)
This calculation does not suffer from the machine epsilon problem, and we are now getting pvalues much smaller than the 2.2E16 limit we were seeing previously.