Details
Description
Currently if you want to create a StreamingContext from checkpoint information, the system will create a new SparkContext. This prevent StreamingContext to be recreated from checkpoints in managed environments where SparkContext is precreated.
Proposed solution: Introduce the following methods on StreamingContext
1. new StreamingContext(checkpointDirectory, sparkContext)
- Recreate StreamingContext from checkpoint using the provided SparkContext
2. new StreamingContext(checkpointDirectory, hadoopConf, sparkContext)
- Recreate StreamingContext from checkpoint using the provided SparkContext and hadoop conf to read the checkpoint
3. StreamingContext.getOrCreate(checkpointDirectory, sparkContext, createFunction: SparkContext => StreamingContext)
- If checkpoint file exists, then recreate StreamingContext using the provided SparkContext (that is, call 1.), else create StreamingContext using the provided createFunction
Also, the corresponding Java and Python API has to be added as well.