Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13641

getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original column names

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • ML, SparkR

    Description

      getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original column names. Let's take the HouseVotes84 data set as an example:

      case m: XXXModel =>
        val attrs = AttributeGroup.fromStructField(
          m.summary.predictions.schema(m.summary.featuresCol))
        attrs.attributes.get.map(_.name.get)
      

      The code above gets features' names from the features column. Usually, the features column is generated by RFormula. The latter has a VectorAssembler in it, which leads the output attributes not equal with the original ones.

      E.g., we want to learn the HouseVotes84's features' name "V1, V2, ..., V16". But with RFormula, we can only get "V1_n, V2_y, ..., V16_y" because the transform function of VectorAssembler adds salts of the column names.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yinxusen Xusen Yin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: