Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3460

MR AM can hang if containers are allocated on a node blacklisted by the AM

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 0.24.0
    • Fix Version/s: 0.23.1
    • Component/s: mr-am, mrv2
    • Labels:
      None

      Description

      When an AM is assigned a FAILED_MAP (priority = 5) container on a nodemanager which it has blacklisted - it tries to
      find a corresponding container request.
      This uses the hostname to find the matching container request - and can end up returning any of the ContainerRequests which may have requested a container on this node. This container request is cleaned to remove the bad node - and then added back to the RM 'ask' list.
      The AM cleans the 'ask' list after each heartbeat - The RM Allocator is still aware of the priority=5 container (in 'remoteRequestsTable') - but this never gets added back to the 'ask' set - which is what is sent to the RM.

      1. MR3460_v3.txt
        13 kB
        Siddharth Seth
      2. MR3460_v4.txt
        13 kB
        Robert Joseph Evans
      3. MR-3460.txt
        9 kB
        Robert Joseph Evans
      4. MR-3460.txt
        7 kB
        Robert Joseph Evans

        Activity

          People

          • Assignee:
            Robert Joseph Evans
            Reporter:
            Siddharth Seth
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development