Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15656

ChiSqTest for goodness of fit doesn't test against a wrong uniform distribution by default

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.5.1, 1.6.1
    • Fix Version/s: None
    • Component/s: MLlib
    • Labels:

      Description

      I've been running a ChiSqTest to test whether my samples fit a uniform distribution.
      The documentation says that If a second vector to test against is not supplied as a parameter, the test runs against a uniform distribution. But when I pass samples drawn from a normal distribution, the p-value calculated is 1.0, which is wrong.
      The problem is that in ChiSqTest.scala, the `chiSquared` function will generate a wrong uniform distribution if the expected vector is not supplied.
      The default generated samples should be
      val expArr = if (expected.size == 0) Array.tabulate(size)(i => i.toDouble / size) else expected.toArray

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              chenjieyuan Jieyuan Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0.5h
                0.5h
                Remaining:
                Remaining Estimate - 0.5h
                0.5h
                Logged:
                Time Spent - Not Specified
                Not Specified