Description
I discovered this bug when working with a build from the master branch (which I believe is 2.1.0). This used to work fine when running spark 1.6.2.
I have a dataframe with an "intData" column that has values like
1 3 2 1 1 2 3 2 2 2 1 3
I have a stage in my pipeline that uses the QuantileDiscretizer to produce equal weight splits like this
new QuantileDiscretizer() .setInputCol("intData") .setOutputCol("intData_bin") .setNumBuckets(10) .fit(df)
But when that gets run it (incorrectly) throws this error:
parameter splits given invalid value [-Infinity, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, Infinity]
I don't think that there should be duplicate splits generated should there be?