Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21550

approxQuantiles throws "next on empty iterator" on empty data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.1.0
    • 2.2.0
    • SQL
    • None

    Description

      The documentation says:

      null and NaN values will be removed from the numerical column before calculation. If
      the dataframe is empty or the column only contains null or NaN, an empty array is returned.
      

      However, this small pyspark example

      sql_context.range(10).filter(col("id") == 42).approxQuantile("id", [0.99], 0.001)
      

      throws

      Py4JJavaError: An error occurred while calling o3493.approxQuantile.
      : java.util.NoSuchElementException: next on empty iterator
      	at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
      	at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
      	at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
      	at scala.collection.IterableLike$class.head(IterableLike.scala:107)
      	at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:186)
      	at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
      	at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
      	at scala.collection.TraversableLike$class.last(TraversableLike.scala:431)
      	at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$last(ArrayOps.scala:186)
      	at scala.collection.IndexedSeqOptimized$class.last(IndexedSeqOptimized.scala:132)
      	at scala.collection.mutable.ArrayOps$ofRef.last(ArrayOps.scala:186)
      	at org.apache.spark.sql.catalyst.util.QuantileSummaries.query(QuantileSummaries.scala:207)
      	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply$mcDD$sp(StatFunctions.scala:92)
      	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
      	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              peay peay
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: