[SPARK-18790] Keep a general offset history of stream batches - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.3, 2.1.0
Component/s: Structured Streaming
Labels:
None

Description

Instead of only keeping the minimum number of offsets around, we should keep enough information to allow us to roll back n batches and reexecute the stream starting from a given point. In particular, we should create a config in SQLConf, spark.sql.streaming.retainedBatches that defaults to 100 and ensure that we keep enough log files in the following places to roll back the specified number of batches:

the offsets that are present in each batch
versions of the state store
the files lists stored for the FileStreamSource
the metadata log stored by the FileStreamSink

Attachments

Issue Links

links to

[Github] Pull Request #16219 (tcondie)

Activity

People

Assignee:: Tyson Condie

Reporter:: Tyson Condie

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Dec/16 20:47

Updated:: 15/Dec/16 05:04

Resolved:: 12/Dec/16 07:39