Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8980

Running GroupByKeyLoadTest on Portable Flink fails

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • runner-flink, testing
    • None

    Description

      When running a GBK Load test using Java harness image and JobServer image generated from master, the load test fails with a cryptic exception:

      Exception in thread "main" java.lang.RuntimeException: Invalid job state: FAILED.
      11:45:31 	at org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
      11:45:31 	at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
      11:45:31 	at org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
      11:45:31 	at org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
      

       

      After some investigation, I found a stacktrace of the error:

      org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) at org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201) at org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) Suppressed: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) at org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84) at org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202) ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle failed, TODO: [BEAM-3962] abort bundle. at org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320) ... 8 more
      

      It seems that the core issue is an IllegalStateException thrown from SdkHarnessClient.java:320, related to BEAM-3962.

       It is important to note that the stacktrace above comes from the Flink cluster, not from the Gradle job that was executed.

      The link to Jenkins job is here: https://builds.apache.org/job/beam_LoadTests_Java_GBK_Flink_Batch_PR/28/console

      Attachments

        Activity

          People

            Unassigned Unassigned
            mwalenia Michał Walenia
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: