[MAHOUT-1000] Implementation of Single Sample T-Test using Map Reduce/Mahout - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.1
Fix Version/s: None
Component/s: classic
Labels:
- newbie
Environment:

Linux, Mac OS, Hadoop 0.20.2, Mahout 0.x

Description

Implement a map/reduce version of the single sample t test to test whether a sample of n subjects comes from a population in which the mean equals a particular value.

For a large dataset, say n millions of rows, one can test whether the sample (large as it is) comes from the population mean.

Input:
1) specified population mean to be tested against
2) hypothesis direction : i.e. "two.sided", "less", "greater".
3) confidence level or alpha
4) flag to indicate paired or not paired

The procedure is as follows:
1. Use Map/Reduce to calculate the mean of the sample.
2. Use Map/Reduce to calculate standard error of the population mean.
3. Use Map/Reduce to calculate the t statistic
4. Estimate the degrees of freedom depending on equal sample variances

Output
1) The value of the t-statistic.
2) The p-value for the test.
3) Flag that is true if the null hypothesis can be rejected with confidence 1 - alpha; false otherwise.

References
http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Dev Lakhani

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Apr/12 18:25

Updated:: 31/Jan/24 22:16

Resolved:: 11/Mar/13 15:39

Time Tracking

Estimated:

672h

Remaining:

672h

Logged:

Not Specified