[SPARK-23377] Bucketizer with multiple columns persistence bug - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0, 2.4.0
Component/s: ML
Labels:
None

Description

A Bucketizer with multiple input/output columns get "inputCol" set to the default value on write -> read which causes it to throw an error on transform. Here's an example.

import org.apache.spark.ml.feature._

val splits = Array(Double.NegativeInfinity, 0, 10, 100, Double.PositiveInfinity)
val bucketizer = new Bucketizer()
  .setSplitsArray(Array(splits, splits))
  .setInputCols(Array("foo1", "foo2"))
  .setOutputCols(Array("bar1", "bar2"))

val data = Seq((1.0, 2.0), (10.0, 100.0), (101.0, -1.0)).toDF("foo1", "foo2")
bucketizer.transform(data)

val path = "/temp/bucketrizer-persist-test"
bucketizer.write.overwrite.save(path)
val bucketizerAfterRead = Bucketizer.read.load(path)
println(bucketizerAfterRead.isDefined(bucketizerAfterRead.outputCol))
// This line throws an error because "outputCol" is set
bucketizerAfterRead.transform(data)

And the trace:

java.lang.IllegalArgumentException: Bucketizer bucketizer_6f0acc3341f7 has the inputCols Param set for multi-column transform. The following Params are not applicable and should not be set: outputCol.
	at org.apache.spark.ml.param.ParamValidators$.checkExclusiveParams$1(params.scala:300)
	at org.apache.spark.ml.param.ParamValidators$.checkSingleVsMultiColumnParams(params.scala:314)
	at org.apache.spark.ml.feature.Bucketizer.transformSchema(Bucketizer.scala:189)
	at org.apache.spark.ml.feature.Bucketizer.transform(Bucketizer.scala:141)
	at line251821108a8a433da484ee31f166c83725.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-6079631:17)

Attachments

Issue Links

relates to

SPARK-23455 Default Params in ML should be saved separately

Resolved

links to

[Github] Pull Request #20566 (viirya)

[Github] Pull Request #20594 (viirya)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Bago Amirbekian

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 09/Feb/18 23:11

Updated:: 20/Feb/18 20:50

Resolved:: 15/Feb/18 19:24