[FLINK-33109] Watermark alignment not applied after recovery from checkpoint - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Problem
Affects Version/s: 1.17.1
Fix Version/s: None
Component/s: Runtime / Coordination
Labels:
None

Description

I am observing a problem where after recovery from a checkpoint the Kafka source watermarks would start to diverge not honoring the watermark alignment setting I have applied.

I have a Kafka source which reads a topic with 32 partitions. I am applying the following watermark strategy:

new EventAwareWatermarkStrategy[KeyedKafkaSourceMessage]](msg => msg.value.getTimestamp)
      .withWatermarkAlignment("alignment-sources-group", time.Duration.ofMillis(sourceWatermarkAlignmentBlocks))

This works great up until my job needs to recover from checkpoint. Once the recovery takes place, no alignment is taking place any more. This can best be illustrated by looking at the watermark metrics for various operators in the image:

You can see how the watermarks disperse after the recovery. Trying to debug the problem I noticed that before the failure there would be calls in

SourceCoordinator::announceCombinedWatermark()

after the recovery, no calls get there, so no value for

watermarkAlignmentParams.getMaxAllowedWatermarkDrift()

is ever read. I can manually fix the problem If I stop the job, clear all state from Zookeeper and then manually start Flink providing the last checkpoint with

'–fromSavepoint'

flag. This would cause the SourceCoordinator to be constructed properly and watermark drift to be checked. Once recovery manually watermarks would again converge to the allowed drift as seen in the metrics:

Let me know If I can be helpful by providing any more information.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

WatermarkTest-1.scala
06/Oct/23 15:43
4 kB
Yordan Pavlov
image-2023-09-18-15-46-16-106.png
18/Sep/23 12:46
69 kB
Yordan Pavlov
image-2023-09-18-15-40-06-868.png
18/Sep/23 12:40
102 kB
Yordan Pavlov

Activity

People

Assignee:: Unassigned

Reporter:: Yordan Pavlov

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 18/Sep/23 12:49

Updated:: 16/May/24 01:38

Resolved:: 16/May/24 01:38