Description
We have implemented a 1-sample, two-sided version of the Kolmogorov Smirnov test, which tests the null hypothesis that the sample comes from a given continuous distribution. We provide various functions to access the functionality: namely, a function that takes an RDD[Double] of the data and a lambda to calculate the CDF, a function that takes an RDD[Double] and an Iterator[(Double,Double,Double)] => Iterator[Double] which uses mapPartition to provide an optimized way to perform the calculation when the CDF calculation requires a non-serializable object (e.g. the apache math commons real distributions), and finally a function that takes an RDD[Double] and a String name of the theoretical distribution to be used. The appropriate result class has been added, as well as tests to the HypothesisTestSuite
Attachments
Issue Links
- is depended upon by
-
SPARK-8996 Add Python API for Kolmogorov-Smirnov Test
- Resolved
- relates to
-
SPARK-9902 Add Java and Python examples to user guide for 1-sample KS test
- Resolved
- links to