Description
import org.apache.spark.ml.feature._ val df = spark.createDataFrame(Seq((2.3, 3.0), (Double.NaN, 3.0), (6.7, Double.NaN))).toDF("a", "b") val splits = Array(Double.NegativeInfinity, 3.0, Double.PositiveInfinity) val bucketizer: Bucketizer = new Bucketizer().setInputCol("a").setOutputCol("aa").setSplits(splits) bucketizer.setHandleInvalid("skip") scala> df.show +---+---+ | a| b| +---+---+ |2.3|3.0| |NaN|3.0| |6.7|NaN| +---+---+ scala> bucketizer.transform(df).show +---+---+---+ | a| b| aa| +---+---+---+ |2.3|3.0|0.0| +---+---+---+
When handleInvalid is set skip, the last item in input is incorrectly droped, though colum 'b' is not an input column