Mahout
  1. Mahout
  2. MAHOUT-1000

Implementation of Single Sample T-Test using Map Reduce/Mahout

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.1
    • Fix Version/s: None
    • Component/s: Math
    • Labels:
    • Environment:

      Linux, Mac OS, Hadoop 0.20.2, Mahout 0.x

      Description

      Implement a map/reduce version of the single sample t test to test whether a sample of n subjects comes from a population in which the mean equals a particular value.

      For a large dataset, say n millions of rows, one can test whether the sample (large as it is) comes from the population mean.

      Input:
      1) specified population mean to be tested against
      2) hypothesis direction : i.e. "two.sided", "less", "greater".
      3) confidence level or alpha
      4) flag to indicate paired or not paired

      The procedure is as follows:
      1. Use Map/Reduce to calculate the mean of the sample.
      2. Use Map/Reduce to calculate standard error of the population mean.
      3. Use Map/Reduce to calculate the t statistic
      4. Estimate the degrees of freedom depending on equal sample variances

      Output
      1) The value of the t-statistic.
      2) The p-value for the test.
      3) Flag that is true if the null hypothesis can be rejected with confidence 1 - alpha; false otherwise.

      References
      http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Dev Lakhani
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 672h
              672h
              Remaining:
              Remaining Estimate - 672h
              672h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development