Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
-
None
Description
In the DoFnOperator and the subclassed ExecutableStageDoFnOperator there are
effectively two processing threads:
(1) The main processing thread for processing elements and watermarks
(2) A timer thread to support ending bundles after a timeout to optimize for
latency
Although the code was written with a different assumption in the past, the two
threads do not interleave with each other. Only one is active at a time. This is
ensured by Flink's "checkpointLock" which is acquired for every method called on
the operator like processElement, processWatermark, snapshotState. It is also
acquired when timers are fired which are set using Flink's TimeService, like it
is the case for (2).
We've seen issues with bundles being closed multiple times resulting in
exceptions like "Already closed". Very rarely, we've also seen dead bundle ids
resurrecting for which no other explanation could be found than an inconsistent
view of the two thread.
Attachments
Issue Links
- links to