Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2693

NPE in AM causes it to lose containers which are never returned back to RM

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: mrv2
    • Labels:
      None

      Description

      The following exception in AM of an application at the top of queue causes this. Once this happens, AM keeps obtaining
      containers from RM and simply loses them. Eventually on a cluster with multiple jobs, no more scheduling happens
      because of these lost containers.

      It happens when there are blacklisted nodes at the app level in AM. A bug in AM
      (RMContainerRequestor.containerFailedOnHost(hostName)) is causing this - nodes are simply getting removed from the
      request-table. We should make sure RM also knows about this update.

      ========================================================================
      11/06/17 06:11:18 INFO rm.RMContainerAllocator: Assigned based on host match 98.138.163.34
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=4978 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=4977 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=1540 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=1539 #asks=6
      11/06/17 06:11:18 ERROR rm.RMContainerAllocator: ERROR IN CONTACTING RM.
      java.lang.NullPointerException
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decResourceRequest(RMContainerRequestor.java:246)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decContainerReq(RMContainerRequestor.java:198)
      at
      org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:523)
      at
      org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$200(RMContainerAllocator.java:433)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:151)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:220)
      at java.lang.Thread.run(Thread.java:619)

      1. MR-2693.3.patch
        23 kB
        Hitesh Shah
      2. MR-2693.2.patch
        19 kB
        Hitesh Shah
      3. MR-2693.1.patch
        17 kB
        Hitesh Shah

        Issue Links

          Activity

          Amol Kekre created issue -
          Arun C Murthy made changes -
          Field Original Value New Value
          Assignee Sharad Agarwal [ sharadag ]
          Hitesh Shah made changes -
          Assignee Sharad Agarwal [ sharadag ] Hitesh Shah [ hitesh ]
          Hitesh Shah made changes -
          Attachment MR-2693.1.patch [ 12499113 ]
          Hitesh Shah made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hitesh Shah made changes -
          Affects Version/s 0.23.0 [ 12315570 ]
          Hitesh Shah made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hitesh Shah made changes -
          Attachment MR-2693.2.patch [ 12499617 ]
          Hitesh Shah made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vinod Kumar Vavilapalli made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hitesh Shah made changes -
          Attachment MR-2693.3.patch [ 12499746 ]
          Hitesh Shah made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Vinod Kumar Vavilapalli made changes -
          Link This issue duplicates MAPREDUCE-3234 [ MAPREDUCE-3234 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Hitesh Shah
              Reporter:
              Amol Kekre
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development