Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11740

Fix DStream checkpointing logic to prevent failures during checkpoint recovery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • DStreams
    • None

    Description

      We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch interval, checkpointing for completing an old batch may run after checkpointing of a new batch. If this happens, checkpoint of an old batch actually has the latest information, but we won't recovery from it. Then we may see some RDD checkpoint file missing exception during checkpoint recovery.

      Attachments

        Activity

          People

            zsxwing Shixiong Zhu
            zsxwing Shixiong Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: