Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.1.0
-
None
-
None
Description
I wish I had more of a simple reproducible case to give, but I got the below exception while selecting null values in one of the columns of a dataframe.
My client code that failed was
df.filter(filterExp).count()
where the filter expression was something like "someCall.isin() || someCall.isNulll". Based on the stack trace, I think the problem may be the "isin()" in the expression - since there are no values to match in it.
There were 412 nulls out of 716,000 total rows for the column being filtered.
The exception seems to indicate that spark is trying to do reduceLeft on an empty list, but the dataset is not empty.
java.lang.UnsupportedOperationException: empty.reduceLeftscala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:137) scala.collection.immutable.List.reduceLeft(List.scala:84) scala.collection.TraversableOnce$class.reduce(TraversableOnce.scala:208) scala.collection.AbstractTraversable.reduce(Traversable.scala:104) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$1.applyOrElse(InMemoryTableScanExec.scala:90) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$1.applyOrElse(InMemoryTableScanExec.scala:54) scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$1.applyOrElse(InMemoryTableScanExec.scala:61) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$1.applyOrElse(InMemoryTableScanExec.scala:54) scala.PartialFunction$Lifted.apply(PartialFunction.scala:223) scala.PartialFunction$Lifted.apply(PartialFunction.scala:219) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$2.apply(InMemoryTableScanExec.scala:95) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$2.apply(InMemoryTableScanExec.scala:94) scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) scala.collection.immutable.List.foreach(List.scala:381) scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) scala.collection.immutable.List.flatMap(List.scala:344) org.apache.spark.sql.execution.columnar.InMemoryTableScanExec.(InMemoryTableScanExec.scala:94) org.apache.spark.sql.execution.SparkStrategies$InMemoryScans$$anonfun$6.apply(SparkStrategies.scala:306) org.apache.spark.sql.execution.SparkStrategies$InMemoryScans$$anonfun$6.apply(SparkStrategies.scala:306) org.apache.spark.sql.execution.SparkPlanner.pruneFilterProject(SparkPlanner.scala:96) org.apache.spark.sql.execution.SparkStrategies$InMemoryScans$.apply(SparkStrategies.scala:302) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62) scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74) scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) scala.collection.Iterator$class.foreach(Iterator.scala:893) scala.collection.AbstractIterator.foreach(Iterator.scala:1336) scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66) scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74) scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) scala.collection.Iterator$class.foreach(Iterator.scala:893) scala.collection.AbstractIterator.foreach(Iterator.scala:1336) scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74) org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66) scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92) org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:79) org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75) org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:84) org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:84) org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2774) org.apache.spark.sql.Dataset.count(Dataset.scala:2404) mypackage.Selection(Selection.scala:34)
Attachments
Issue Links
- duplicates
-
SPARK-22249 UnsupportedOperationException: empty.reduceLeft when caching a dataframe
- Resolved