Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15656

ChiSqTest for goodness of fit doesn't test against a wrong uniform distribution by default

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.5.1, 1.6.1
    • None
    • MLlib

    Description

      I've been running a ChiSqTest to test whether my samples fit a uniform distribution.
      The documentation says that If a second vector to test against is not supplied as a parameter, the test runs against a uniform distribution. But when I pass samples drawn from a normal distribution, the p-value calculated is 1.0, which is wrong.
      The problem is that in ChiSqTest.scala, the `chiSquared` function will generate a wrong uniform distribution if the expected vector is not supplied.
      The default generated samples should be
      val expArr = if (expected.size == 0) Array.tabulate(size)(i => i.toDouble / size) else expected.toArray

      Attachments

        Activity

          People

            Unassigned Unassigned
            chenjieyuan Jieyuan Chen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 0.5h
                0.5h
                Remaining:
                Remaining Estimate - 0.5h
                0.5h
                Logged:
                Time Spent - Not Specified
                Not Specified