[SPARK-21550] approxQuantiles throws "next on empty iterator" on empty data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.1.0
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Description

The documentation says:

null and NaN values will be removed from the numerical column before calculation. If
the dataframe is empty or the column only contains null or NaN, an empty array is returned.

However, this small pyspark example

sql_context.range(10).filter(col("id") == 42).approxQuantile("id", [0.99], 0.001)

throws

Py4JJavaError: An error occurred while calling o3493.approxQuantile.
: java.util.NoSuchElementException: next on empty iterator
	at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
	at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
	at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
	at scala.collection.IterableLike$class.head(IterableLike.scala:107)
	at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:186)
	at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
	at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.last(TraversableLike.scala:431)
	at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$last(ArrayOps.scala:186)
	at scala.collection.IndexedSeqOptimized$class.last(IndexedSeqOptimized.scala:132)
	at scala.collection.mutable.ArrayOps$ofRef.last(ArrayOps.scala:186)
	at org.apache.spark.sql.catalyst.util.QuantileSummaries.query(QuantileSummaries.scala:207)
	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply$mcDD$sp(StatFunctions.scala:92)
	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)

Attachments

Issue Links

is duplicated by

SPARK-19573 Make NaN/null handling consistent in approxQuantile

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: peay

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Jul/17 17:00

Updated:: 23/Feb/18 10:59

Resolved:: 27/Jul/17 18:23