Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25069

YARNHighAvailabilityITCase.testJobRecoversAfterKillingTaskManager failed on AZP

    XMLWordPrintableJSON

Details

    Description

      The test YARNHighAvailabilityITCase.testJobRecoversAfterKillingTaskManager fails on AZP with:

      2021-11-25T18:28:27.9848753Z Nov 25 18:28:27 [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3,676.541 s <<< FAILURE! - in org.apache.flink.yarn.YARNHighAvailabilityITCase
      2021-11-25T18:28:27.9849967Z Nov 25 18:28:27 [ERROR] org.apache.flink.yarn.YARNHighAvailabilityITCase.testJobRecoversAfterKillingTaskManager  Time elapsed: 70.846 s  <<< ERROR!
      2021-11-25T18:28:27.9850929Z Nov 25 18:28:27 java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
      2021-11-25T18:28:27.9854591Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
      2021-11-25T18:28:27.9855441Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
      2021-11-25T18:28:27.9856301Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.submitJob(YARNHighAvailabilityITCase.java:378)
      2021-11-25T18:28:27.9857202Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.lambda$testJobRecoversAfterKillingTaskManager$1(YARNHighAvailabilityITCase.java:204)
      2021-11-25T18:28:27.9858300Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288)
      2021-11-25T18:28:27.9859245Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.testJobRecoversAfterKillingTaskManager(YARNHighAvailabilityITCase.java:197)
      2021-11-25T18:28:27.9860026Z Nov 25 18:28:27 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2021-11-25T18:28:27.9860705Z Nov 25 18:28:27 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      2021-11-25T18:28:27.9861466Z Nov 25 18:28:27 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      2021-11-25T18:28:27.9862158Z Nov 25 18:28:27 	at java.lang.reflect.Method.invoke(Method.java:498)
      2021-11-25T18:28:27.9863016Z Nov 25 18:28:27 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
      2021-11-25T18:28:27.9863959Z Nov 25 18:28:27 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      2021-11-25T18:28:27.9864829Z Nov 25 18:28:27 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
      2021-11-25T18:28:27.9865604Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      2021-11-25T18:28:27.9866300Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
      2021-11-25T18:28:27.9867044Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
      2021-11-25T18:28:27.9867692Z Nov 25 18:28:27 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      2021-11-25T18:28:27.9868220Z Nov 25 18:28:27 	at java.lang.Thread.run(Thread.java:748)
      2021-11-25T18:28:27.9869072Z Nov 25 18:28:27 	Suppressed: java.lang.AssertionError: There is at least one application on the cluster that is not finished.[App application_1637861234319_0001 is in state RUNNING.]
      2021-11-25T18:28:27.9870263Z Nov 25 18:28:27 		at org.junit.Assert.fail(Assert.java:89)
      2021-11-25T18:28:27.9870862Z Nov 25 18:28:27 		at org.apache.flink.yarn.YarnTestBase$CleanupYarnApplication.close(YarnTestBase.java:325)
      2021-11-25T18:28:27.9871516Z Nov 25 18:28:27 		at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:289)
      2021-11-25T18:28:27.9871986Z Nov 25 18:28:27 		... 13 more
      2021-11-25T18:28:27.9872665Z Nov 25 18:28:27 Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
      2021-11-25T18:28:27.9873393Z Nov 25 18:28:27 	at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$11(RestClusterClient.java:433)
      2021-11-25T18:28:27.9874102Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
      2021-11-25T18:28:27.9874774Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
      2021-11-25T18:28:27.9875454Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
      2021-11-25T18:28:27.9876123Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
      2021-11-25T18:28:27.9876837Z Nov 25 18:28:27 	at org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:373)
      2021-11-25T18:28:27.9877539Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
      2021-11-25T18:28:27.9878393Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
      2021-11-25T18:28:27.9879043Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
      2021-11-25T18:28:27.9879768Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
      2021-11-25T18:28:27.9880461Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:943)
      2021-11-25T18:28:27.9881229Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
      2021-11-25T18:28:27.9881883Z Nov 25 18:28:27 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      2021-11-25T18:28:27.9882700Z Nov 25 18:28:27 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      2021-11-25T18:28:27.9883223Z Nov 25 18:28:27 	... 1 more
      2021-11-25T18:28:27.9883780Z Nov 25 18:28:27 Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error., <Exception on server side:
      2021-11-25T18:28:27.9884529Z Nov 25 18:28:27 org.apache.flink.runtime.client.DuplicateJobSubmissionException: Job has already been submitted.
      2021-11-25T18:28:27.9885242Z Nov 25 18:28:27 	at org.apache.flink.runtime.client.DuplicateJobSubmissionException.of(DuplicateJobSubmissionException.java:29)
      2021-11-25T18:28:27.9885954Z Nov 25 18:28:27 	at org.apache.flink.runtime.dispatcher.Dispatcher.submitJob(Dispatcher.java:320)
      2021-11-25T18:28:27.9886536Z Nov 25 18:28:27 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2021-11-25T18:28:27.9887090Z Nov 25 18:28:27 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      2021-11-25T18:28:27.9887751Z Nov 25 18:28:27 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      2021-11-25T18:28:27.9888357Z Nov 25 18:28:27 	at java.lang.reflect.Method.invoke(Method.java:498)
      2021-11-25T18:28:27.9888989Z Nov 25 18:28:27 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
      2021-11-25T18:28:27.9889817Z Nov 25 18:28:27 	at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
      2021-11-25T18:28:27.9890560Z Nov 25 18:28:27 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
      2021-11-25T18:28:27.9891256Z Nov 25 18:28:27 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
      2021-11-25T18:28:27.9891961Z Nov 25 18:28:27 	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
      2021-11-25T18:28:27.9892834Z Nov 25 18:28:27 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
      2021-11-25T18:28:27.9893462Z Nov 25 18:28:27 	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
      2021-11-25T18:28:27.9894044Z Nov 25 18:28:27 	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
      2021-11-25T18:28:27.9894632Z Nov 25 18:28:27 	at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
      2021-11-25T18:28:27.9895213Z Nov 25 18:28:27 	at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
      2021-11-25T18:28:27.9895795Z Nov 25 18:28:27 	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
      2021-11-25T18:28:27.9896393Z Nov 25 18:28:27 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
      2021-11-25T18:28:27.9896996Z Nov 25 18:28:27 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
      2021-11-25T18:28:27.9897602Z Nov 25 18:28:27 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
      2021-11-25T18:28:27.9898166Z Nov 25 18:28:27 	at akka.actor.Actor.aroundReceive(Actor.scala:537)
      2021-11-25T18:28:27.9898683Z Nov 25 18:28:27 	at akka.actor.Actor.aroundReceive$(Actor.scala:535)
      2021-11-25T18:28:27.9899307Z Nov 25 18:28:27 	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
      2021-11-25T18:28:27.9900000Z Nov 25 18:28:27 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
      2021-11-25T18:28:27.9900547Z Nov 25 18:28:27 	at akka.actor.ActorCell.invoke(ActorCell.scala:548)
      2021-11-25T18:28:27.9901085Z Nov 25 18:28:27 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
      2021-11-25T18:28:27.9901616Z Nov 25 18:28:27 	at akka.dispatch.Mailbox.run(Mailbox.scala:231)
      2021-11-25T18:28:27.9902200Z Nov 25 18:28:27 	at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
      2021-11-25T18:28:27.9902967Z Nov 25 18:28:27 	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
      2021-11-25T18:28:27.9903587Z Nov 25 18:28:27 	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
      2021-11-25T18:28:27.9904182Z Nov 25 18:28:27 	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
      2021-11-25T18:28:27.9904805Z Nov 25 18:28:27 	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
      2021-11-25T18:28:27.9905290Z Nov 25 18:28:27 
      2021-11-25T18:28:27.9905666Z Nov 25 18:28:27 End of exception on server side>]
      2021-11-25T18:28:27.9906179Z Nov 25 18:28:27 	at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:532)
      2021-11-25T18:28:27.9906842Z Nov 25 18:28:27 	at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:512)
      2021-11-25T18:28:27.9907507Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966)
      2021-11-25T18:28:27.9908163Z Nov 25 18:28:27 	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
      2021-11-25T18:28:27.9908681Z Nov 25 18:28:27 	... 4 more
      2021-11-25T18:28:27.9909001Z Nov 25 18:28:27 
      2021-11-25T18:28:27.9909632Z Nov 25 18:28:27 [ERROR] org.apache.flink.yarn.YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint  Time elapsed: 1,800.315 s  <<< ERROR!
      2021-11-25T18:28:27.9910379Z Nov 25 18:28:27 org.junit.runners.model.TestTimedOutException: test timed out after 1800000 milliseconds
      2021-11-25T18:28:27.9910924Z Nov 25 18:28:27 	at java.lang.Thread.sleep(Native Method)
      2021-11-25T18:28:27.9911487Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1240)
      2021-11-25T18:28:27.9912182Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:607)
      2021-11-25T18:28:27.9913034Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:419)
      2021-11-25T18:28:27.9913782Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.deploySessionCluster(YARNHighAvailabilityITCase.java:364)
      2021-11-25T18:28:27.9914595Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.lambda$testKillYarnSessionClusterEntrypoint$0(YARNHighAvailabilityITCase.java:174)
      2021-11-25T18:28:27.9915326Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase$$Lambda$503/1259621657.run(Unknown Source)
      2021-11-25T18:28:27.9915947Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288)
      2021-11-25T18:28:27.9916650Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint(YARNHighAvailabilityITCase.java:162)
      2021-11-25T18:28:27.9917328Z Nov 25 18:28:27 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2021-11-25T18:28:27.9917905Z Nov 25 18:28:27 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      2021-11-25T18:28:27.9918570Z Nov 25 18:28:27 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      2021-11-25T18:28:27.9919246Z Nov 25 18:28:27 	at java.lang.reflect.Method.invoke(Method.java:498)
      2021-11-25T18:28:27.9919847Z Nov 25 18:28:27 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
      2021-11-25T18:28:27.9920514Z Nov 25 18:28:27 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      2021-11-25T18:28:27.9921293Z Nov 25 18:28:27 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
      2021-11-25T18:28:27.9921936Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      2021-11-25T18:28:27.9922772Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
      2021-11-25T18:28:27.9923503Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
      2021-11-25T18:28:27.9924238Z Nov 25 18:28:27 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      2021-11-25T18:28:27.9924757Z Nov 25 18:28:27 	at java.lang.Thread.run(Thread.java:748)
      2021-11-25T18:28:27.9925156Z Nov 25 18:28:27 
      2021-11-25T18:28:27.9925694Z Nov 25 18:28:27 [ERROR] org.apache.flink.yarn.YARNHighAvailabilityITCase.testClusterClientRetrieval  Time elapsed: 1,800.087 s  <<< ERROR!
      2021-11-25T18:28:27.9926411Z Nov 25 18:28:27 org.junit.runners.model.TestTimedOutException: test timed out after 1800000 milliseconds
      2021-11-25T18:28:27.9926957Z Nov 25 18:28:27 	at java.lang.Thread.sleep(Native Method)
      2021-11-25T18:28:27.9927499Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1240)
      2021-11-25T18:28:27.9928190Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:607)
      2021-11-25T18:28:27.9928899Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:419)
      2021-11-25T18:28:27.9929731Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.deploySessionCluster(YARNHighAvailabilityITCase.java:364)
      2021-11-25T18:28:27.9930513Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.lambda$testClusterClientRetrieval$2(YARNHighAvailabilityITCase.java:230)
      2021-11-25T18:28:27.9931236Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase$$Lambda$504/1893740748.run(Unknown Source)
      2021-11-25T18:28:27.9931852Z Nov 25 18:28:27 	at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288)
      2021-11-25T18:28:27.9932684Z Nov 25 18:28:27 	at org.apache.flink.yarn.YARNHighAvailabilityITCase.testClusterClientRetrieval(YARNHighAvailabilityITCase.java:225)
      2021-11-25T18:28:27.9933406Z Nov 25 18:28:27 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2021-11-25T18:28:27.9933989Z Nov 25 18:28:27 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      2021-11-25T18:28:27.9934647Z Nov 25 18:28:27 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      2021-11-25T18:28:27.9935251Z Nov 25 18:28:27 	at java.lang.reflect.Method.invoke(Method.java:498)
      2021-11-25T18:28:27.9935839Z Nov 25 18:28:27 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
      2021-11-25T18:28:27.9936502Z Nov 25 18:28:27 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      2021-11-25T18:28:27.9937158Z Nov 25 18:28:27 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
      2021-11-25T18:28:27.9937813Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      2021-11-25T18:28:27.9938497Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
      2021-11-25T18:28:27.9939288Z Nov 25 18:28:27 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
      2021-11-25T18:28:27.9939947Z Nov 25 18:28:27 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      2021-11-25T18:28:27.9940452Z Nov 25 18:28:27 	at java.lang.Thread.run(Thread.java:748)
      2021-11-25T18:28:27.9940854Z Nov 25 18:28:27 
      2021-11-25T18:28:28.9205416Z Nov 25 18:28:28 [ERROR] Picked up JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
      

      https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=27085&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=29849

      Attachments

        Activity

          People

            mapohl Matthias Pohl
            trohrmann Till Rohrmann
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: