Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Currently in the Samza event-time watermark aggregation logic, it will compute the watermark as the min of watermarks from all upstream tasks. However, in the lagging cases, the upstream task might not generate watermark for a long period. In this case, the new watermarks will not be generated and the downstream aggregation will be stuck.
To address this issue, we will implement an mechanism to exclude the tasks that have been "idle" in generating watermark for a configured time, so that the aggregated watermarks will still be generated. Note this mechanism will unblock downstream, but also at the risk of moving eventtime clock faster and the events from lagging tasks will become late arrivals.