Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-1952

Cannot run ConnectedComponents example: Could not allocate a slot on instance

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.9
    • 0.9
    • Runtime / Coordination
    • None

    Description

      Steps to reproduce

      ./bin/yarn-session.sh -n 350 
      

      ... wait until they are connected ...

      Number of connected TaskManagers changed to 266. Slots available: 266
      Number of connected TaskManagers changed to 323. Slots available: 323
      Number of connected TaskManagers changed to 334. Slots available: 334
      Number of connected TaskManagers changed to 343. Slots available: 343
      Number of connected TaskManagers changed to 350. Slots available: 350
      

      Start CC

      ./bin/flink run -p 350 ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar
      

      ---> it runs

      Run KMeans, let it fail with

      Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - execution #0 to slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: required 350, but only 254 available. The total number of network buffers is currently set to 2048. You can increase this number by setting the configuration key 'taskmanager.network.numberOfBuffers'.
      

      ... as expected.

      (I've waited for 10 minutes between the two submissions)

      Starting CC now will fail:

      ./bin/flink run -p 350 ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar 
      

      Error message(s):

      Caused by: java.lang.IllegalStateException: Could not schedule consumer vertex IterationHead(WorksetIteration (Unnamed Delta Iteration)) (19/350)
      	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479)
      	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469)
      	at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
      	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
      	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
      	at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
      	... 4 more
      Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ cloud-19 - 1 slots - URL: akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the co-location constraint.
      	at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247)
      	at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110)
      	at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262)
      	at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436)
      	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475)
      	... 9 more
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sewen Stephan Ewen
            rmetzger Robert Metzger
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment