Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11400

StreamingDataflowWorker stuck commits logic triggers exceptions if commits eventually complete

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.27.0
    • runner-dataflow
    • None

    Description

      Commits that have not completed in a timeout are cancelled as stuck and lost, in logs showing up as:
      Detected key with sharding key -6893288510319386341 stuck in COMMITTING state, completing it with error.

      However if the commit was not lost but just very slow, when it eventually does complete the following error occurs:

      Exception while processing commit response {}
      "java.lang.NullPointerException
      at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877)
      at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$ComputationState.completeWork(StreamingDataflowWorker.java:2246)

      This occurs on the commit stream which finishes processing the current batch of responses but then throws the error. This causes the stream to complete with an error, resending all of the other commits. So if there were a large number of commits on the stream, we make slow progress and only complete a couple before retrying everything again. This slowdown can cause further commits to exceed the timeout, entering a feedback loop.

      Attachments

        Issue Links

          Activity

            People

              scwhittle Sam Whittle
              scwhittle Sam Whittle
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h