Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-15082

Mesos App Master does not respect taskmanager.memory.total-process.size

    XMLWordPrintableJSON

Details

    Description

      Description
      When the Mesos App Master is started with taskmanager.memory.total-process.size, the value is not respected.

      One can reproduce this when starting the App Master with the command below:

      /bin/mesos-appmaster.sh \ 
      -Dtaskmanager.memory.total-process.size=2048m \
      -Djobmanager.heap.size=2048m \
      ...
      

      The ClusterEntryPoint will fail with an exception (see below). The reason is that the default value of mesos.resourcemanager.tasks.mem will be taken as the total process memory size (1024 mb).

      org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint MesosSessionClusterEntrypoint.
              at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
              at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
              at org.apache.flink.mesos.entrypoint.MesosSessionClusterEntrypoint.main(MesosSessionClusterEntrypoint.java:126)
      Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
              at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
              at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
              at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
              at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
              at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
              ... 2 more
      Caused by: org.apache.flink.configuration.IllegalConfigurationException: Sum of configured Framework Heap Memory (134217728 bytes), Framework Off-Heap Memory (134217728 bytes), Task Off-Heap Memory (0 bytes), Managed Memory (719407031 bytes) and Shuffle Memory (80530638 bytes) exceed configured Total Flink Memory (805306368 bytes).
              at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveInternalMemoryFromTotalFlinkMemory(TaskExecutorResourceUtils.java:273)
              at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveResourceSpecWithTotalProcessMemory(TaskExecutorResourceUtils.java:210)
              at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:108)
              at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:94)
              at org.apache.flink.mesos.runtime.clusterframework.MesosTaskManagerParameters.create(MesosTaskManagerParameters.java:341)
              at org.apache.flink.mesos.util.MesosUtils.createTmParameters(MesosUtils.java:109)
              at org.apache.flink.mesos.runtime.clusterframework.MesosResourceManagerFactory.createActiveResourceManager(MesosResourceManagerFactory.java:80)
              at org.apache.flink.runtime.resourcemanager.ActiveResourceManagerFactory.createResourceManager(ActiveResourceManagerFactory.java:58)
              at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:170)
              ... 9 more
      

      Expected Behavior

      • If taskmanager.memory.total-process.size and mesos.resourcemanager.tasks.mem are both set and differ in their values, an exception should be thrown
      • If only taskmanager.memory.total-process.size is set and mesos.resourcemanager.tasks.mem is not set, then the value configured by the former should be respected
      • If only mesos.resourcemanager.tasks.mem is set and taskmanager.memory.total-process.size is not set, then the value configured by the former should be respected

      Attachments

        Issue Links

          Activity

            People

              azagrebin Andrey Zagrebin
              gjy Gary Yao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m