Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11034

State garbage collection timers set by Dataflow SimpleParDoFn pile up for the GlobalWindow

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P2
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.25.0
    • Component/s: runner-dataflow
    • Labels:
      None

      Description

      If the dofn is stateful, garbage collection timers are set for the end of the window plus allowed lateness:
      https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L491

      For the global window this ends up setting garbage collection timers that will only fire once the pipeline is drained. For pipelines that have constantly newly arriving unique stateful keys, and otherwise cleanup their state appropriately when triggering occurs, the # of timers builds up over time.

      Example window and trigger, where the user has the opportunity to clean up state for the key after at most a minute. However they have no control over the timer set.

      GlobalWindows()
      .triggering(Repeatedly.forever(AfterFirst.of(
      AfterPane.elementCountAtLeast(5000),
      AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMi
      nutes(1))).discardingFiredPanes().withAllowedLateness(Duration.ZERO);

        Attachments

          Activity

            People

            • Assignee:
              scwhittle Sam Whittle
              Reporter:
              scwhittle Sam Whittle
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 20m
                2h 20m