Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8242

YARN NM: OOM error while reading back the state store on recovery

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.6.0, 2.9.0, 2.6.5, 2.8.3, 3.1.0, 2.7.6, 3.0.2
    • 3.2.0, 3.1.2
    • yarn
    • None
    • Reviewed

    Description

      On startup the NM reads its state store and builds a list of application in the state store to process. If the number of applications in the state store is large and have a lot of "state" connected to it the NM can run OOM and never get to the point that it can start processing the recovery.
      Since it never starts the recovery there is no way for the NM to ever pass this point. It will require a change in heap size to get the NM started.

       

      Following is the stack trace

      at java.lang.OutOfMemoryError.<init> (OutOfMemoryError.java:48) at com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto.<init> (YarnProtos.java:47069) at org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto.<init> (YarnProtos.java:47014) at org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom (YarnProtos.java:47102) at org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage (CodedInputStream.java:309) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.<init> (YarnProtos.java:41016) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.<init> (YarnProtos.java:40942) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom (YarnProtos.java:41080) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage (CodedInputStream.java:309) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.<init> (YarnServiceProtos.java:24517) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.<init> (YarnServiceProtos.java:24464) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom (YarnServiceProtos.java:24568) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom (YarnServiceProtos.java:24563) at com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom (YarnServiceProtos.java:24739) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState (NMLeveldbStateStoreService.java:217) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState (NMLeveldbStateStoreService.java:170) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover (ContainerManagerImpl.java:253) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit (ContainerManagerImpl.java:237) at org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit (CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager (NodeManager.java:474) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main (NodeManager.java:521)

      Attachments

        1. YARN-8242.001.patch
          24 kB
          Kanwaljeet Sachdev
        2. YARN-8242.002.patch
          26 kB
          Kanwaljeet Sachdev
        3. YARN-8242.003.patch
          28 kB
          Kanwaljeet Sachdev
        4. YARN-8242.004.patch
          52 kB
          Pradeep Ambati
        5. YARN-8242.005.patch
          74 kB
          Pradeep Ambati
        6. YARN-8242.006.patch
          75 kB
          Pradeep Ambati
        7. YARN-8242.007.patch
          84 kB
          Pradeep Ambati
        8. YARN-8242.008.patch
          78 kB
          Pradeep Ambati

        Issue Links

          Activity

            People

              pradeepambati Pradeep Ambati
              kanwaljeets Kanwaljeet Sachdev
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: