Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
-
None
Description
If the dofn is stateful, garbage collection timers are set for the end of the window plus allowed lateness:
https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L491
For the global window this ends up setting garbage collection timers that will only fire once the pipeline is drained. For pipelines that have constantly newly arriving unique stateful keys, and otherwise cleanup their state appropriately when triggering occurs, the # of timers builds up over time.
Example window and trigger, where the user has the opportunity to clean up state for the key after at most a minute. However they have no control over the timer set.
GlobalWindows()
.triggering(Repeatedly.forever(AfterFirst.of(
AfterPane.elementCountAtLeast(5000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMi
nutes(1))).discardingFiredPanes().withAllowedLateness(Duration.ZERO);