Details
-
Bug
-
Status: Resolved
-
P1
-
Resolution: Fixed
-
None
Description
Because the Teardown method has no relation to the atomicity of processing and commiting of output, it is EXTREMELY DANGEROUS to use to flush outputs, and buffered data there is extremely likely to never be flushed. If a DoFn instance with buffered data is lost (for example, via worker/machine failure), and the runner has committed the result of processing that input, the data is lost.
Not commenting on this being the case can cause users to believe that (especially if running a batch pipeline) that their data will be flushed on pipeline completion. This is very dangerous behavior that we do not warn of sufficiently.
Attachments
Issue Links
- links to