Details
Description
Implement a map/reduce version of the single sample t test to test whether a sample of n subjects comes from a population in which the mean equals a particular value.
For a large dataset, say n millions of rows, one can test whether the sample (large as it is) comes from the population mean.
Input:
1) specified population mean to be tested against
2) hypothesis direction : i.e. "two.sided", "less", "greater".
3) confidence level or alpha
4) flag to indicate paired or not paired
The procedure is as follows:
1. Use Map/Reduce to calculate the mean of the sample.
2. Use Map/Reduce to calculate standard error of the population mean.
3. Use Map/Reduce to calculate the t statistic
4. Estimate the degrees of freedom depending on equal sample variances
Output
1) The value of the tstatistic.
2) The pvalue for the test.
3) Flag that is true if the null hypothesis can be rejected with confidence 1  alpha; false otherwise.
References
http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html
Activity
Field  Original Value  New Value 

Status  Open [ 1 ]  Resolved [ 5 ] 
Resolution  Fixed [ 1 ] 
Status  Resolved [ 5 ]  Closed [ 6 ] 
Affects Version/s  0.1 [ 12312976 ]  
Affects Version/s  Backlog [ 12318886 ]  
Fix Version/s  Backlog [ 12318886 ] 
Transition  Time In Source Status  Execution Times  Last Executer  Last Execution Date  


324d 21h 13m  1  Sebastian Schelter  11/Mar/13 15:39  

328d 16h 27m  1  Suneel Marthi  03/Feb/14 08:06 
I am not sure that I see the value here. All you need for this calculation is the means, the squared differences and the counts.
Do we really need this in Mahout when 3 lines of Pig suffice?