Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8810

Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.19.0
    • Component/s: runner-dataflow
    • Labels:
      None

      Description

      In several pipelines using streaming engine and thus the streaming commit rpcs, work became stuck in state COMMITTING indefinitely. Such stuckness coincided with repeated streaming rpc failures.

      The status page shows that the key has work in state COMMITTING, and has 1 queued work item.
      There is a single active commit stream, with 0 pending requests.

      The stream could exist past the stream deadline because the StreamCache only closes stream due to the deadline when a stream is retrieved, which only occurs if there are other commits. Since the pipeline is stuck due to this event, there are no other commits.

      It seems therefore there is some race on the commitStream between onNewStream and commitWork that either prevents work from being retried, an exception that triggers between when the pending request is removed and the callback is called, or some potential corruption of the activeWork data structure.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                scwhittle Sam Whittle
                Reporter:
                scwhittle Sam Whittle
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h