Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14074

MesosResourceManager can't create new taskmanagers in Session Cluster Mode.

    XMLWordPrintableJSON

    Details

      Description

      Hi, I'm trying to launch multiple jobs in Flink Session Cluster, deployed on mesos.
      Flink's version is 1.9.0.

      The very first resource allocation completes successfully, and first submitted job launches, but submitting any amount of jobs afterwords doesn't affect the cluster in any way and no additional TaskManagers are allocated.

      From the logs I see that MesosResourceManager is requesting Slots for the newly submitted jobs:  "o.a.f.m.r.c.MesosResourceManager - Request slot with profile ResourceProfile..." but line "Starting a new worker." appears in log only the same amount of times as taskmanagers count, allocated for the first job.

      I'm a complete noob in flink internals, but took a wild guess about a reason. I think that the problem is in this check: https://github.com/apache/flink/blob/release-1.9.0/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosResourceManager.java#L436

      It might be that RM is lazily allocated at the first call by a factory, and then a private final field slotsPerWorker is set. So this check will prevent creation of any new worker after iterator traverses the entire collection. My main assumption is that slotsPerWorker is never modified again.

       

      I'm sorry that I didn't do much of investigation before reporting, but I'll try to do some after a weekend. I plan to build flink without this check and see if it helps. Also I'll play around with tests for this RM. Since it's my time running time flink internals, I'll be back after a few days.

      Any help will much appreciated.

      Thanks in advance.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                trohrmann Till Rohrmann
                Reporter:
                Atlaster Alexander Kasyanenko
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m