Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3460

MR AM can hang if containers are allocated on a node blacklisted by the AM

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 2.0.0-alpha
    • Fix Version/s: 0.23.1
    • Component/s: mr-am, mrv2
    • Labels:
      None

      Description

      When an AM is assigned a FAILED_MAP (priority = 5) container on a nodemanager which it has blacklisted - it tries to
      find a corresponding container request.
      This uses the hostname to find the matching container request - and can end up returning any of the ContainerRequests which may have requested a container on this node. This container request is cleaned to remove the bad node - and then added back to the RM 'ask' list.
      The AM cleans the 'ask' list after each heartbeat - The RM Allocator is still aware of the priority=5 container (in 'remoteRequestsTable') - but this never gets added back to the 'ask' set - which is what is sent to the RM.

        Attachments

        1. MR-3460.txt
          7 kB
          Robert Joseph Evans
        2. MR-3460.txt
          9 kB
          Robert Joseph Evans
        3. MR3460_v4.txt
          13 kB
          Robert Joseph Evans
        4. MR3460_v3.txt
          13 kB
          Siddharth Seth

          Activity

            People

            • Assignee:
              revans2 Robert Joseph Evans
              Reporter:
              sseth Siddharth Seth
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: