Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25171

After restart, StreamingContext is replaying the last successful micro-batch right before the stop

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.3.1
    • Fix Version/s: None
    • Component/s: DStreams
    • Labels:

      Description

      Please look at this line:

       

      https://github.com/apache/spark/blob/8bde4678166f5f01837919d4f8d742b89f5e76b8/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala#L216

       

      "checkpointTime" represents a successful micro-batch. Why do we still treat it as "pending"?

       

      I think this is a bug. It cause duplicate processing.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Haopu Wang Haopu Wang
              Shepherd:
              Saisai Shao
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: