Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2693

NPE in AM causes it to lose containers which are never returned back to RM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.23.0
    • 0.23.0
    • mrv2
    • None

    Description

      The following exception in AM of an application at the top of queue causes this. Once this happens, AM keeps obtaining
      containers from RM and simply loses them. Eventually on a cluster with multiple jobs, no more scheduling happens
      because of these lost containers.

      It happens when there are blacklisted nodes at the app level in AM. A bug in AM
      (RMContainerRequestor.containerFailedOnHost(hostName)) is causing this - nodes are simply getting removed from the
      request-table. We should make sure RM also knows about this update.

      ========================================================================
      11/06/17 06:11:18 INFO rm.RMContainerAllocator: Assigned based on host match 98.138.163.34
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=4978 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=4977 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=1540 #asks=5
      11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30 priority=20
      resourceName=... numContainers=1539 #asks=6
      11/06/17 06:11:18 ERROR rm.RMContainerAllocator: ERROR IN CONTACTING RM.
      java.lang.NullPointerException
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decResourceRequest(RMContainerRequestor.java:246)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decContainerReq(RMContainerRequestor.java:198)
      at
      org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:523)
      at
      org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$200(RMContainerAllocator.java:433)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:151)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:220)
      at java.lang.Thread.run(Thread.java:619)

      Attachments

        1. MR-2693.1.patch
          17 kB
          Hitesh Shah
        2. MR-2693.2.patch
          19 kB
          Hitesh Shah
        3. MR-2693.3.patch
          23 kB
          Hitesh Shah

        Issue Links

          Activity

            People

              hitesh Hitesh Shah
              amolkekre Amol Kekre
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: