Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11034

State garbage collection timers set by Dataflow SimpleParDoFn pile up for the GlobalWindow

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.25.0
    • runner-dataflow
    • None

    Description

      If the dofn is stateful, garbage collection timers are set for the end of the window plus allowed lateness:
      https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L491

      For the global window this ends up setting garbage collection timers that will only fire once the pipeline is drained. For pipelines that have constantly newly arriving unique stateful keys, and otherwise cleanup their state appropriately when triggering occurs, the # of timers builds up over time.

      Example window and trigger, where the user has the opportunity to clean up state for the key after at most a minute. However they have no control over the timer set.

      GlobalWindows()
      .triggering(Repeatedly.forever(AfterFirst.of(
      AfterPane.elementCountAtLeast(5000),
      AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMi
      nutes(1))).discardingFiredPanes().withAllowedLateness(Duration.ZERO);

      Attachments

        Activity

          People

            scwhittle Sam Whittle
            scwhittle Sam Whittle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 20m
                2h 20m