Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19304

Kinesis checkpoint recovery is 10x slow

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.2.0
    • Component/s: Spark Core
    • Labels:
    • Environment:

      using s3 for checkpoints using 1 executor, with 19g mem & 3 cores per executor

    • Target Version/s:

      Description

      Application runs fine initially, running batches of 1hour and the processing time is less than 30 minutes on average. For some reason lets say the application crashes, and we try to restart from checkpoint. The processing now takes forever and does not move forward. We tried to test out the same thing at batch interval of 1 minute, the processing runs fine and takes 1.2 minutes for batch to finish. When we recover from checkpoint it takes about 15 minutes for each batch. Post the recovery the batches again process at normal speed

      I suspect the KinesisBackedBlockRDD used for recovery is causing the slowdown.

      Stackoverflow post with more details: http://stackoverflow.com/questions/38390567/spark-streaming-checkpoint-recovery-is-very-very-slow

        Attachments

          Activity

            People

            • Assignee:
              gaurav24 Gaurav Shah
              Reporter:
              gaurav24 Gaurav Shah
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: