Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
-
None
Description
Commits that have not completed in a timeout are cancelled as stuck and lost, in logs showing up as:
Detected key with sharding key -6893288510319386341 stuck in COMMITTING state, completing it with error.
However if the commit was not lost but just very slow, when it eventually does complete the following error occurs:
Exception while processing commit response {}
"java.lang.NullPointerException
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$ComputationState.completeWork(StreamingDataflowWorker.java:2246)
This occurs on the commit stream which finishes processing the current batch of responses but then throws the error. This causes the stream to complete with an error, resending all of the other commits. So if there were a large number of commits on the stream, we make slow progress and only complete a couple before retrying everything again. This slowdown can cause further commits to exceed the timeout, entering a feedback loop.
Attachments
Issue Links
- links to