Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15656

ChiSqTest for goodness of fit doesn't test against a wrong uniform distribution by default

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.5.1, 1.6.1
    • Fix Version/s: None
    • Component/s: MLlib
    • Labels:

      Description

      I've been running a ChiSqTest to test whether my samples fit a uniform distribution.
      The documentation says that If a second vector to test against is not supplied as a parameter, the test runs against a uniform distribution. But when I pass samples drawn from a normal distribution, the p-value calculated is 1.0, which is wrong.
      The problem is that in ChiSqTest.scala, the `chiSquared` function will generate a wrong uniform distribution if the expected vector is not supplied.
      The default generated samples should be
      val expArr = if (expected.size == 0) Array.tabulate(size)(i => i.toDouble / size) else expected.toArray

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 0.5h
              0.5h
              Remaining:
              Remaining Estimate - 0.5h
              0.5h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Issue deployment