Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
Description
ProcessContinuation.resume() is useful for tailing files - when we reach current EOF, we want to voluntarily suspend the process() call rather than wait for runner to checkpoint us.
In BEAM-1903, DoFn.ProcessContinuation was removed because there was ambiguity about the semantics of resume() especially w.r.t. the following situation described in https://docs.google.com/document/d/1BGc8pM1GOvZhwR9SARSVte-20XEoBUxrGJ5gTWXdv3c/edit : the runner has taken a checkpoint on the tracker, and then the ProcessElement call returns resume() signaling that the work is still not done - then there's 2 checkpoints to deal with.
Instead, the proper way to refine this semantics is:
- After checkpoint() on a RestrictionTracker, the tracker MUST fail all subsequent tryClaim() calls, and MUST succeed in checkDone().
- After a failed tryClaim() call, the ProcessElement method MUST return stop()
- So ProcessElement can return resume() only instead of doing tryClaim()
- Then, if the runner has already taken a checkpoint but tracker has returned resume(), we do not need to take a new checkpoint - the one already taken already accurately describes the remainder of the work.