[SPARK-30939] StringIndexer setOutputCols does not set output cols - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: ML
Labels:
None

Target Version/s:

3.0.0

Description

(Credit to Brooke Wenig for finding it). Quoting:

".. The python code works completely fine, but the scala code is outputting

strIdx_8278ae6d55b3__output, strIdx_8278ae6d55b3__output, strIdx_8278ae6d55b3__output, strIdx_8278ae6d55b3__output, strIdx_8278ae6d55b3__output, strIdx_8278ae6d55b3__output, strIdx_8278ae6d55b3__output

for the output of the string indexer, instead of using the column names specified in here:

val stringIndexer = new StringIndexer()
  .setInputCols(categoricalCols)
  .setOutputCols(indexOutputCols)
  .setHandleInvalid("skip")

I was expecting the resulting column names to be

indexOutputCols: Array[String] = Array(host_is_superhostIndex, cancellation_policyIndex, instant_bookableIndex, neighbourhood_cleansedIndex, property_typeIndex, room_typeIndex, bed_typeIndex)

Indeed I'm pretty sure this is the bug:

  private def validateAndTransformField(
      schema: StructType,
      inputColName: String,
      outputColName: String): StructField = {
    val inputDataType = schema(inputColName).dataType
    require(inputDataType == StringType || inputDataType.isInstanceOf[NumericType],
      s"The input column $inputColName must be either string type or numeric type, " +
        s"but got $inputDataType.")
    require(schema.fields.forall(_.name != outputColName),
      s"Output column $outputColName already exists.")
    NominalAttribute.defaultAttr.withName($(outputCol)).toStructField()
  }

The last line does not use the transformed output col name, but the default single output col parameter.

Attachments

Issue Links

relates to

SPARK-11215 Add multiple columns support to StringIndexer

Resolved

links to

GitHub Pull Request #27684

Activity

People

Assignee:: Sean R. Owen

Reporter:: Sean R. Owen

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 24/Feb/20 14:36

Updated:: 25/Feb/20 02:18

Resolved:: 25/Feb/20 02:18