Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6752

Allow StreamingContext to be recreated from checkpoint and existing SparkContext

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.1.1, 1.2.1, 1.3.1
    • 1.4.0
    • DStreams
    • None

    Description

      Currently if you want to create a StreamingContext from checkpoint information, the system will create a new SparkContext. This prevent StreamingContext to be recreated from checkpoints in managed environments where SparkContext is precreated.

      Proposed solution: Introduce the following methods on StreamingContext

      1. new StreamingContext(checkpointDirectory, sparkContext)

      • Recreate StreamingContext from checkpoint using the provided SparkContext

      2. new StreamingContext(checkpointDirectory, hadoopConf, sparkContext)

      • Recreate StreamingContext from checkpoint using the provided SparkContext and hadoop conf to read the checkpoint

      3. StreamingContext.getOrCreate(checkpointDirectory, sparkContext, createFunction: SparkContext => StreamingContext)

      • If checkpoint file exists, then recreate StreamingContext using the provided SparkContext (that is, call 1.), else create StreamingContext using the provided createFunction

      Also, the corresponding Java and Python API has to be added as well.

      Attachments

        Activity

          People

            tdas Tathagata Das
            tdas Tathagata Das
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: