Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17606

New batches are not created when there are 1000 created after restarting streaming from checkpoint.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.6.1
    • None
    • DStreams

    Description

      When spark restarts from a checkpoint after being down for a while.
      It recreates missing batch since the down time.

      When there are few missing batches, spark creates new incoming batch every batchTime, but when there is enough missing time to create 1000 batches no new batch is created.

      So when all these batch are completed the stream is idle ...

      I think there is a rigid limit set somewhere.

      I was expecting that spark continue to recreate missed batches, maybe not all at once ( because it's look like it's cause driver memory problem ), and then recreate batches each batchTime.

      Another solution would be to not create missing batches but still restart the direct input.

      Right know for me the only solution to restart a stream after a long break it to remove the checkpoint to allow the creation of a new stream. But losing all my states.

      ps : I'm speaking about direct Kafka input because it's the source I'm currently using, I don't know what happens with other sources.

      Attachments

        Activity

          People

            Unassigned Unassigned
            crakjie etienne
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: