Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1265

Fair Scheduler chokes on unhealthy node reconnect

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.1-beta
    • Fix Version/s: 2.3.0
    • Component/s: resourcemanager, scheduler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this.

      I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it.

        Attachments

        1. YARN-1265.patch
          3 kB
          Sandy Ryza
        2. YARN-1265-1.patch
          1 kB
          Sandy Ryza

          Activity

            People

            • Assignee:
              sandyr Sandy Ryza
              Reporter:
              sandyr Sandy Ryza
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: