Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17697

BinaryLogisticRegressionSummary, GLM Summary should handle non-Double numeric types

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.1, 2.1.0
    • 2.0.2, 2.1.0
    • ML
    • None

    Description

      Say you have a DataFrame with a label column of Integer type. You can fit a LogisticRegresionModel since LR handles casting to DoubleType internally.

      However, if you call evaluate() on it, then this line does not handle casting properly, so you get a runtime error (MatchError) for an invalid schema: https://github.com/apache/spark/blob/2cd327ef5e4c3f6b8468ebb2352479a1686b7888/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L863

      We should handle casting. And test evaluate() with other numeric types.

      *ALSO* We should check elsewhere in logreg and other algorithms to see if we can catch the same issue elsewhere.

      Attachments

        Activity

          People

            bryanc Bryan Cutler
            josephkb Joseph K. Bradley
            Joseph K. Bradley Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: