Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2693

NPE in AM causes it to lose containers which are never returned back to RM

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: mrv2
    • Labels:
      None

      Description

      The following exception in AM of an application at the top of queue causes this. Once this happens, AM keeps obtaining
      containers from RM and simply loses them. Eventually on a cluster with multiple jobs, no more scheduling happens
      because of these lost containers.

      It happens when there are blacklisted nodes at the app level in AM. A bug in AM
      (RMContainerRequestor.containerFailedOnHost(hostName)) is causing this - nodes are simply getting removed from the
      request-table. We should make sure RM also knows about this update.

      ========================================================================
      11/06/17 06:11:18 INFO rm.RMContainerAllocator: Assigned based on host match 98.138.163.34
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=4978 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=4977 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=1540 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=1539 #asks=6
      11/06/17 06:11:18 ERROR rm.RMContainerAllocator: ERROR IN CONTACTING RM.
      java.lang.NullPointerException
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decResourceRequest(RMContainerRequestor.java:246)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decContainerReq(RMContainerRequestor.java:198)
      at
      org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:523)
      at
      org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$200(RMContainerAllocator.java:433)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:151)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:220)
      at java.lang.Thread.run(Thread.java:619)

        Attachments

        1. MR-2693.3.patch
          23 kB
          Hitesh Shah
        2. MR-2693.2.patch
          19 kB
          Hitesh Shah
        3. MR-2693.1.patch
          17 kB
          Hitesh Shah

          Issue Links

            Activity

              People

              • Assignee:
                hitesh Hitesh Shah
                Reporter:
                amolkekre Amol Kekre
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: