Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30774

The default checkpointing interval is not as claimed in the comment.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: 2.4.5
    • Fix Version/s: None
    • Component/s: DStreams
    • Labels:

      Description

      https://github.com/apache/spark/blob/71737861531180bbda9aec8d241b1428fe91cab2/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L199-L203MajorMajor

      The checkpoint duration is set to be the window duration, maybe the idea in the old comment wanting to set to the higher of 10s or window-size is no longer relevant.

      I propose we either adapt the comment to just say to just say that we set the checkpoint duration to the window size and clean up how that value is set, or we change the code to do as the comment remarks.

       

      So, the original statement I made was wrong. This code is still broken though. Consider the case where window duration is 3, the result would be a checkpoint size of 12s. That doesn't correspond to the rule implied by the comment and is thus unexpected behaviour.

      This code does however result in the checkpoint size being a multiple of the slide duration, which is safe as far as I know.

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kykrueger Kyle Krueger

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment