Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3460

MR AM can hang if containers are allocated on a node blacklisted by the AM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.23.0, 2.0.0-alpha
    • 0.23.1
    • mr-am, mrv2
    • None

    Description

      When an AM is assigned a FAILED_MAP (priority = 5) container on a nodemanager which it has blacklisted - it tries to
      find a corresponding container request.
      This uses the hostname to find the matching container request - and can end up returning any of the ContainerRequests which may have requested a container on this node. This container request is cleaned to remove the bad node - and then added back to the RM 'ask' list.
      The AM cleans the 'ask' list after each heartbeat - The RM Allocator is still aware of the priority=5 container (in 'remoteRequestsTable') - but this never gets added back to the 'ask' set - which is what is sent to the RM.

      Attachments

        1. MR-3460.txt
          7 kB
          Robert Joseph Evans
        2. MR-3460.txt
          9 kB
          Robert Joseph Evans
        3. MR3460_v3.txt
          13 kB
          Siddharth Seth
        4. MR3460_v4.txt
          13 kB
          Robert Joseph Evans

        Activity

          People

            revans2 Robert Joseph Evans
            sseth Siddharth Seth
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: