Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12087

DStream.saveAsHadoopFiles can throw ConcurrentModificationException

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.1, 1.4.1, 1.5.2
    • Fix Version/s: 1.4.2, 1.5.3, 1.6.0
    • Component/s: DStreams
    • Labels:
      None
    • Target Version/s:

      Description

      The JobConf object created in DStream.saveAsHadoopFiles is used concurrently in multiple places:

      • The JobConf is updated by RDD.saveAsHadoopFile() before the job is launched
      • The JobConf is serialized as part of the DStream checkpoints.

      These concurrent accesses (updating in one thread, while the another thread is serializing it) can lead to concurrentModidicationException in the underlying Java hashmap using in the internal Hadoop Configuration object.

        Attachments

          Activity

            People

            • Assignee:
              tdas Tathagata Das
              Reporter:
              tdas Tathagata Das
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: