Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1113

External Spilling Bug - hash collision causes NoSuchElementException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.9.0
    • 0.9.1, 1.0.0
    • Shuffle, Spark Core
    • None

    Description

      When reading KV pairs back from disk, ExternalAppendOnlyMap maintains a StreamBuffer for each spilled file. These StreamBuffers are ordered by key hash code, and a hash of Int.MAX_VALUE signifies that the corresponding StreamBuffer is empty.

      However, Int.MAX_VALUE is a perfectly legitimate hash value. If there exists a key with this value, then ExternalAppendOnlyMap does not differentiate between empty StreamBuffers and StreamBuffers that contain only this key. As a result, a NoSuchElementException is thrown - https://github.com/apache/incubator-spark/blob/95d28ff3d0d20d9c583e184f9e2c5ae842d8a4d9/core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala#L304.

      java.util.NoSuchElementException (java.util.NoSuchElementException)
      org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)
      org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)
      org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)

      Attachments

        Issue Links

          Activity

            People

              andrewor14 Andrew Or
              andrewor14 Andrew Or
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: