Description
scala> data res73: org.apache.spark.sql.DataFrame = [label: double, features: vector] scala> data.count res74: Long = 150 scala> val s = data.randomSplit(Array(1,2,-0.01)) s: Array[org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]] = Array([label: double, features: vector], [label: double, features: vector], [label: double, features: vector]) scala> s(0).count res75: Long = 51 scala> s(2).count 16/08/03 18:28:27 ERROR Executor: Exception in task 0.0 in stage 76.0 (TID 66) java.lang.IllegalArgumentException: requirement failed: Upper bound (1.0033444816053512) must be <= 1.0 at scala.Predef$.require(Predef.scala:224) scala> data.sample(false, -0.01) res80: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [label: double, features: vector] scala> data.sample(false, -0.01).count 16/08/03 18:30:33 ERROR Executor: Exception in task 0.0 in stage 84.0 (TID 71) java.lang.IllegalArgumentException: requirement failed: Lower bound (0.0) must be <= upper bound (-0.01)
val s = data.randomSplit(Array(1,2,-0.01)) run successfully, even if I use s(0) in the following lines.
data.sample(false, -0.01) should also fail immediately.