Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1930

The Container is running beyond physical memory limits, so as to be killed.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.1, 1.1.0
    • YARN
    • None

    Description

      When the containers occupies 8G memory ,the containers were killed
      yarn node manager log:

      2014-05-23 13:35:30,776 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=4947,containerID=container_1400809535638_0015_01_000005] is running beyond physical memory limits. Current usage: 8.6 GB of 8.5 GB physical memory used; 10.0 GB of 17.8 GB virtual memory used. Killing container.
      Dump of the process-tree for container_1400809535638_0015_01_000005 :
              |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
              |- 4947 25417 4947 4947 (bash) 0 0 110804992 335 /bin/bash -c /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms8192m -Xmx8192m  -Xss2m -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005/tmp  -Dlog4j.configuration=log4j-spark-container.properties -Dspark.akka.askTimeout="120" -Dspark.akka.timeout="120" -Dspark.akka.frameSize="20" org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://spark@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 10dian72.domain.test 4 1> /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_000005/stdout 2> /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_000005/stderr 
              |- 4957 4947 4947 4947 (java) 157809 12620 10667016192 2245522 /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms8192m -Xmx8192m -Xss2m -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005/tmp -Dlog4j.configuration=log4j-spark-container.properties -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 -Dspark.akka.frameSize=20 org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://spark@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 10dian72.domain.test 4 
      
      2014-05-23 13:35:30,776 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 4947
      2014-05-23 13:35:30,776 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1400809535638_0015_01_000005 transitioned from RUNNING to KILLING
      2014-05-23 13:35:30,777 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1400809535638_0015_01_000005
      2014-05-23 13:35:30,788 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1400809535638_0015_01_000005 is : 143
      2014-05-23 13:35:30,829 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1400809535638_0015_01_000005 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
      2014-05-23 13:35:30,830 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005
      2014-05-23 13:35:30,830 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=spark        OPERATION=Container Finished - Killed   TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_1400809535638_0015    CONTAINERID=container_1400809535638_0015_01_000005
      2014-05-23 13:35:30,830 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1400809535638_0015_01_000005 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
      2014-05-23 13:35:30,830 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1400809535638_0015_01_000005 from application application_1400809535638_0015
      

      I think it should be related with YarnAllocationHandler.MEMORY_OVERHEA
      https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala#L562

      Relative to 8G, 384 MB is too small

      Attachments

        Activity

          People

            gq Guoqiang Li
            gq Guoqiang Li
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: