Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30774

The default checkpointing interval is not as claimed in the comment.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.4.5
    • None
    • DStreams

    Description

      https://github.com/apache/spark/blob/71737861531180bbda9aec8d241b1428fe91cab2/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L199-L203MajorMajor

      The checkpoint duration is set to be the window duration, maybe the idea in the old comment wanting to set to the higher of 10s or window-size is no longer relevant.

      I propose we either adapt the comment to just say to just say that we set the checkpoint duration to the window size and clean up how that value is set, or we change the code to do as the comment remarks.

       

      So, the original statement I made was wrong. This code is still broken though. Consider the case where window duration is 3, the result would be a checkpoint size of 12s. That doesn't correspond to the rule implied by the comment and is thus unexpected behaviour.

      This code does however result in the checkpoint size being a multiple of the slide duration, which is safe as far as I know.

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            kykrueger Kyle Krueger
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: