Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17560

No Slots available exception in Apache Flink Job Manager while Scheduling

    XMLWordPrintableJSON

Details

    Description

      Set up
      ------
      Flink verson 1.8.3

      Zookeeper HA cluster

      1 ResourceManager/Dispatcher (Same Node)
      1 TaskManager
      4 pipelines running with various parallelism's

      Issue
      ------

      Occationally when the Job Manager gets restarted we noticed that all the pipelines are not getting scheduled. The error that is reporeted by the Job Manger is 'not enough slots are available'. This should not be the case because task manager was deployed with sufficient slots for the number of pipelines/parallelism we have.

      We further noticed that the slot report sent by the taskmanger contains solts filled with old CANCELLED job Ids. I am not sure why the task manager still holds the details of the old jobs. Thread dump on the task manager confirms that old pipelines are not running.

      I am aware of https://issues.apache.org/jira/browse/FLINK-12865. But this is not the issue happening in this case.

      Attachments

        1. jobmgr.log
          10.72 MB
          josson paul kalapparambath
        2. threaddump-tm.txt
          761 kB
          josson paul kalapparambath
        3. tm.log
          14.38 MB
          josson paul kalapparambath

        Issue Links

          Activity

            People

              Unassigned Unassigned
              josson josson paul kalapparambath
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: