Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21964

HA e2e test failed due to not enough resources being available with Adaptive Scheduler

    XMLWordPrintableJSON

Details

    Description

      This build failed (not exclusively) because of a failure in the Running HA per-job cluster (rocks, non-incremental) end-to-end test e2e test.

      We faced the problem with not enough resources being available again.

      Caused by: org.apache.flink.runtime.client.JobExecutionException: Not enough resources available for scheduling.
              at org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.lambda$determineParallelism$20(AdaptiveScheduler.java:600) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at java.util.Optional.orElseThrow(Optional.java:290) ~[?:1.8.0_282]
              at org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.determineParallelism(AdaptiveScheduler.java:597) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.createExecutionGraphWithAvailableResourcesAsync(AdaptiveScheduler.java:729) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.goToCreatingExecutionGraph(AdaptiveScheduler.java:716) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.scheduler.adaptive.WaitingForResources.createExecutionGraphWithAvailableResources(WaitingForResources.java:112) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.scheduler.adaptive.WaitingForResources.resourceTimeout(WaitingForResources.java:108) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.runIfState(AdaptiveScheduler.java:894) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.lambda$runIfState$24(AdaptiveScheduler.java:909) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_282]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_282]
              at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.actor.Actor$class.aroundReceive(Actor.scala:517) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.actor.ActorCell.invoke(ActorCell.scala:561) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.dispatch.Mailbox.run(Mailbox.scala:225) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              at akka.dispatch.Mailbox.exec(Mailbox.scala:235) ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
              ... 4 more
      

      I attached the builds artifacts to this issue. The stacktrace listed above can be found in 20210322.5/e2e-flink-logs/flink-vsts-standalonejob-0-fv-az127-848.log.

      Attachments

        1. logs-ci_build_adaptive-e2e-logs.zip
          164 kB
          Matthias Pohl

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: