Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-1310

Improve accuracy and performance of 2-sample Kolmogorov-Smirnov test

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Labels:
      None

      Description

      As of 3.5, the exactP method used to compute exact p-values for 2-sample Kolmogorov-Smirnov tests is very slow, as it is based on a naive implementation that enumarates all n-m partitions of the combined sample. As a result, its use is not recommended for problems where the product of the two sample sizes exceeds 100 and the kolmogorovSmirnovTest method uses it only for samples in this range. To handle sample size products between 100 and 10000, where the asymptotic KS distribution can be used, this method currently uses Monte Carlo simulation. Convergence is poor for many problem instances, resulting in inaccurate results.

      To eliminate the need for the Monte Carlo simulation and increase the performance of exactP itself, a faster exactP implementation should be added. This can be implemented by unwinding the recursive functions defined in Chapter 5, table 5.2 in:

      Wilcox, Rand. 2012. Introduction to Robust Estimation and Hypothesis Testing, Chapter 5, 3rd Ed. Academic Press.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              psteitz Phil Steitz
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: