Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18790

Keep a general offset history of stream batches

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.3, 2.1.0
    • Structured Streaming
    • None

    Description

      Instead of only keeping the minimum number of offsets around, we should keep enough information to allow us to roll back n batches and reexecute the stream starting from a given point. In particular, we should create a config in SQLConf, spark.sql.streaming.retainedBatches that defaults to 100 and ensure that we keep enough log files in the following places to roll back the specified number of batches:

      the offsets that are present in each batch
      versions of the state store
      the files lists stored for the FileStreamSource
      the metadata log stored by the FileStreamSink

      Attachments

        Activity

          People

            tcondie Tyson Condie
            tcondie Tyson Condie
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: