Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12087

DStream.saveAsHadoopFiles can throw ConcurrentModificationException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.1, 1.4.1, 1.5.2
    • 1.4.2, 1.5.3, 1.6.0
    • DStreams
    • None

    Description

      The JobConf object created in DStream.saveAsHadoopFiles is used concurrently in multiple places:

      • The JobConf is updated by RDD.saveAsHadoopFile() before the job is launched
      • The JobConf is serialized as part of the DStream checkpoints.

      These concurrent accesses (updating in one thread, while the another thread is serializing it) can lead to concurrentModidicationException in the underlying Java hashmap using in the internal Hadoop Configuration object.

      Attachments

        Activity

          People

            tdas Tathagata Das
            tdas Tathagata Das
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: