Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
0.9.0
-
None
Description
When reading KV pairs back from disk, ExternalAppendOnlyMap maintains a StreamBuffer for each spilled file. These StreamBuffers are ordered by key hash code, and a hash of Int.MAX_VALUE signifies that the corresponding StreamBuffer is empty.
However, Int.MAX_VALUE is a perfectly legitimate hash value. If there exists a key with this value, then ExternalAppendOnlyMap does not differentiate between empty StreamBuffers and StreamBuffers that contain only this key. As a result, a NoSuchElementException is thrown - https://github.com/apache/incubator-spark/blob/95d28ff3d0d20d9c583e184f9e2c5ae842d8a4d9/core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala#L304.
java.util.NoSuchElementException (java.util.NoSuchElementException)
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
Attachments
Issue Links
- is duplicated by
-
SPARK-1045 ExternalAppendOnlyMap Iterator throw no such element on joining two large rdd
- Resolved