Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8598

Implementation of 1-sample, two-sided, Kolmogorov Smirnov Test for RDDs

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.5.0
    • MLlib
    • None

    Description

      We have implemented a 1-sample, two-sided version of the Kolmogorov Smirnov test, which tests the null hypothesis that the sample comes from a given continuous distribution. We provide various functions to access the functionality: namely, a function that takes an RDD[Double] of the data and a lambda to calculate the CDF, a function that takes an RDD[Double] and an Iterator[(Double,Double,Double)] => Iterator[Double] which uses mapPartition to provide an optimized way to perform the calculation when the CDF calculation requires a non-serializable object (e.g. the apache math commons real distributions), and finally a function that takes an RDD[Double] and a String name of the theoretical distribution to be used. The appropriate result class has been added, as well as tests to the HypothesisTestSuite

      Attachments

        Issue Links

          Activity

            People

              josepablocam Jose Cambronero
              josepablocam Jose Cambronero
              Xiangrui Meng Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: