Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10573

IndexToString transformSchema adds output field as DoubleType

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0
    • Fix Version/s: 1.5.1, 1.6.0
    • Component/s: ML
    • Labels:
      None

      Description

      Reproducible example:

      val stage = new IndexToString().setInputCol("input").setOutputCol("output")
      val inSchema = StructType(Seq(StructField("input", DoubleType)))
      val outSchema = stage.transformSchema(inSchema)
      assert(outSchema("output").dataType == StringType)
      

      The root cause seems to be that it uses NominalAttribute.toStructField which assumes DoubleType. It would probably be better to just use SchemaUtils.appendColumn and explicitly set the data type.

        Attachments

          Activity

            People

            • Assignee:
              pnpritchard Nick Pritchard
              Reporter:
              pnpritchard Nick Pritchard
              Shepherd:
              Xiangrui Meng
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: