Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2641

Decommission nodes on -refreshNodes instead of next NM-RM heartbeat

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.0
    • Fix Version/s: 2.7.0
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      improve node decommission latency in RM.
      Currently the node decommission only happened after RM received nodeHeartbeat from the Node Manager. The node heartbeat interval is configurable. The default value is 1 second.
      It will be better to do the decommission during RM Refresh(NodesListManager) instead of nodeHeartbeat(ResourceTrackerService).
      This will be a much more serious issue:
      After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed before NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. The RMNode will only expire in RM after "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.

        Attachments

        1. YARN-2641.003.patch
          9 kB
          Zhihai Xu
        2. YARN-2641.002.patch
          14 kB
          Zhihai Xu
        3. YARN-2641.001.patch
          13 kB
          Zhihai Xu
        4. YARN-2641.000.patch
          9 kB
          Zhihai Xu

          Activity

            People

            • Assignee:
              zxu Zhihai Xu
              Reporter:
              zxu Zhihai Xu
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: