[SPARK-24351] offsetLog/commitLog purge thresholdBatchId should be computed with current committed epoch but not currentBatchId in CP mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: Structured Streaming
Labels:
None

Description

In structured streaming, there is a conf spark.sql.streaming.minBatchesToRetain which is set to specify 'The minimum number of batches that must be retained and made recoverable' as described in SQLConf. In continuous processing, the metadata purge is triggered when an epoch is committed in ContinuousExecution.
Since currentBatchId increases independently in cp mode, the current committed epoch may be far behind currentBatchId if some task hangs for some time. It is not safe to discard the metadata with thresholdBatchId computed based on currentBatchId because we may clean all the metadata in the checkpoint directory.

Attachments

Issue Links

links to

[Github] Pull Request #21400 (ivoson)

Activity

People

Assignee:: Tengfei Huang

Reporter:: Tengfei Huang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/May/18 15:45

Updated:: 01/Jun/18 17:48

Resolved:: 01/Jun/18 17:48