[SPARK-16875] Add args checking for DataSet randomSplit and sample - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.1, 2.1.0
Component/s: SQL
Labels:
None

Description

scala> data
res73: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> data.count
res74: Long = 150

scala> val s = data.randomSplit(Array(1,2,-0.01))
s: Array[org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]] = Array([label: double, features: vector], [label: double, features: vector], [label: double, features: vector])

scala> s(0).count
res75: Long = 51

scala> s(2).count
16/08/03 18:28:27 ERROR Executor: Exception in task 0.0 in stage 76.0 (TID 66)
java.lang.IllegalArgumentException: requirement failed: Upper bound (1.0033444816053512) must be <= 1.0
	at scala.Predef$.require(Predef.scala:224)

scala> data.sample(false, -0.01)
res80: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [label: double, features: vector]

scala> data.sample(false, -0.01).count
16/08/03 18:30:33 ERROR Executor: Exception in task 0.0 in stage 84.0 (TID 71)
java.lang.IllegalArgumentException: requirement failed: Lower bound (0.0) must be <= upper bound (-0.01)

val s = data.randomSplit(Array(1,2,-0.01)) run successfully, even if I use s(0) in the following lines.
data.sample(false, -0.01) should also fail immediately.

Attachments

Issue Links

links to

[Github] Pull Request #14478 (zhengruifeng)

Activity

People

Assignee:: Ruifeng Zheng

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Aug/16 10:39

Updated:: 04/Aug/16 20:40

Resolved:: 04/Aug/16 20:40