Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11400

StreamingDataflowWorker stuck commits logic triggers exceptions if commits eventually complete

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P2
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.27.0
    • Component/s: runner-dataflow
    • Labels:
      None

      Description

      Commits that have not completed in a timeout are cancelled as stuck and lost, in logs showing up as:
      Detected key with sharding key -6893288510319386341 stuck in COMMITTING state, completing it with error.

      However if the commit was not lost but just very slow, when it eventually does complete the following error occurs:

      Exception while processing commit response {}
      "java.lang.NullPointerException
      at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877)
      at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$ComputationState.completeWork(StreamingDataflowWorker.java:2246)

      This occurs on the commit stream which finishes processing the current batch of responses but then throws the error. This causes the stream to complete with an error, resending all of the other commits. So if there were a large number of commits on the stream, we make slow progress and only complete a couple before retrying everything again. This slowdown can cause further commits to exceed the timeout, entering a feedback loop.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                scwhittle Sam Whittle
                Reporter:
                scwhittle Sam Whittle
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h