Implement a map/reduce version of the single sample t test to test whether a sample of n subjects comes from a population in which the mean equals a particular value.
For a large dataset, say n millions of rows, one can test whether the sample (large as it is) comes from the population mean.
1) specified population mean to be tested against
2) hypothesis direction : i.e. "two.sided", "less", "greater".
3) confidence level or alpha
4) flag to indicate paired or not paired
The procedure is as follows:
1. Use Map/Reduce to calculate the mean of the sample.
2. Use Map/Reduce to calculate standard error of the population mean.
3. Use Map/Reduce to calculate the t statistic
4. Estimate the degrees of freedom depending on equal sample variances
1) The value of the t-statistic.
2) The p-value for the test.
3) Flag that is true if the null hypothesis can be rejected with confidence 1 - alpha; false otherwise.