Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9651

StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.21.0
    • runner-dataflow
    • None

    Description

      Operation ongoing in step <redacted> for at least 28h10m00s without outputting or completing in state windmill-read at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) at org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) at org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) at org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) at org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.<init>(GrpcWindmillServer.java:941) at org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) at org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown Source) at org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.<init>(WindmillServerStub.java:159) at org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.<init>(WindmillServerStub.java:158) at org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) at org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) at org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) at org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) at org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) at
      <redacted>

      Because the stream is started in a StreamPool synchronized block, all other threads interacting with StreamPool to get or release streams end up blocking.

      It is unclear if the stream never became usable and thus blocked forever or if there is a race with the use of the Phaser that causes the stuckness.

      Attachments

        Activity

          People

            scwhittle Sam Whittle
            scwhittle Sam Whittle
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 50m
                3h 50m