diff --git hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnHA.apt.vm hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnHA.apt.vm new file mode 100644 index 0000000..4e2c5b8 --- /dev/null +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnHA.apt.vm @@ -0,0 +1,165 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + YARN High Availability + --- + --- + ${maven.build.timestamp} + +YARN High Availability + + \[ {{{./index.html}Go Back}} \] + +%{toc|section=1|fromDepth=0} + +* Introduction + + This guide provides an overview of the YARN ResourceManager High Availability, + and details on how to configure and use this feature. The ResourceManager (RM) + is responsible for tracking the resources in a cluster, and scheduling + applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager + is the single point of failure in a YARN cluster. The High Availability + feature adds redundancy in the form of an Active/Standby ResoureManager pair + to remove this otherwise single point of failure. Furthermore, upon failover + from the Standby ResourceManager to the Active, the applications can resume + from their last check-pointed state; e.g. completed map tasks in a MapReduce + job are not re-run on a subsequent attempt. This allows handling (1) unplanned + events like machine crashes, and (2) planned maintenance events such as + software or hardware upgrades on the machine running the ResourceManager + without any significant performance effect to running applications. + +* Architecture + + ResourceManager availability story starts with the ability to save the + internal state of the RM via the State Store. The store enables restarting the + RM without losing any state. ResourceManager HA is realized through + Active/Standby architecture - one RM is Active, and one or more RMs are in + Standby mode waiting to take over should anything happen to the Active. + +** The State Store + + The state store enables storing the internal state of the RM - applications + and their attempts, delegation tokens and version information. The state of + the cluster - resource consumption on individual nodes - is not stored as it + can be easily reconstructed when the nodes heartbeat to the "new" RM. The + available alternatives for the state store are MemoryRMStateStore (a + memory-based implementation used primarily for testing), + FileSystemRMStateStore (file system-based implementation, HDFS can be used + for the file system), and ZKRMStateStore (ZooKeeper-based + implementation). The ZKRMStateStore implicitly allows write access to a + single RM at any point in time, and hence is the recommended store to use in + an HA cluster. When using the ZKRMStateStore, there is no need for a separate + fencing mechanism to address a potential split-brain situation where multiple + RMs assume the Active role. + + +** RM Restart + + With the state store enabled, a restarted RM loads the internal state and + continues to operate as if it never actually went down. The scheduler + resconstructs its state from node heartbeats. A new attempt is spawned for + each managed application previously submitted to the RM. Applications can + checkpoint periodically to avoid losing any work; the MR-AM checkpoints + completed tasks, so the completed tasks need not be re-run on RM restart. + +** RM HA + + RM HA is an extension of RM restart; a second RM takes over in case the first + RM goes down for any reason. On start-up, each RM is in the Standby state - + the process is started, but the state is not loaded. When transitioning to + active, the RM loads the internal state from the designated state store and + starts all the internal services to assume the Active RM duties. The stimulus + to transition-to-active comes from either the admin (through CLI) or through + the integrated failover controller when automatic failover is enabled. + +*** Client, ApplicationMaster and NodeManager failover + + When there are multiple RMs, the yarn-configuration (yarn-site.xml) used by + clients and nodes is expected to list all the RMs. Clients, + ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in + a round-robin fashion until they hit the Active RM. If the Active goes down, + they resume the round-robin polling until they hit the "new" Active. + +*** Manual transitions and failover + + When automatic failover is not enabled, admins have to manually transition + one of the RMs to Active. To failover from one RM to the other, they are + expected to first transition the Active-RM to Standby and transition a + Standby-RM to Active. All this can be done using the "yarn rmadmin" CLI. + +*** Automatic failover + + The RMs embed the Zookeeper-based ActiveStandbyElector to decide which RM + should be the Active. When the Active goes down or becomes unresponsive, + another RM is automatically elected to be the Active and takes over. Note + that, there is no need to run a separate ZKFC daemon as is the case for + HDFS. + +* Deployment + +** Configurations + + All the above features are controlled by configuration knobs. Here is a list + of required/important ones. yarn-default.xml carries a full-list of knobs. + +*---------------+--------------+ +|| Configuration Property || Description | +*---------------+--------------+ +| yarn.resourcemanager.recovery.enabled | Enables the state store, +| |and application recovery on RM restart +*---------------+--------------+ +| yarn.resourcemanager.store.class | The RMStateStore implementation to use. +| | Suggest using org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore +*---------------+--------------+ +| yarn.resourcemanager.zk-address | Address of the ZK-quorum. +| | Used both for the store and embedded leader election. +*---------------+--------------+ +| yarn.resourcemanager.ha.enabled | Enable RM HA +*---------------+--------------+ +| yarn.resourcemanager.ha.rm-ids | List of logical IDs for the RMs. +| | e.g., "rm1,rm2" +*---------------+--------------+ +| yarn.resourcemanager.hostname. | For each , specify the hostname the +| | RM corresponds to. Alternately, one could set each of the RPC addresses. See +| | yarn-default.xml for more information +*---------------+--------------+ +| yarn.resourcemanager.ha.id | Identifies the RM in the ensemble. This is optional; +| | however, if being set, ensure the two RMs have their own IDs in the config +*---------------+--------------+ +| yarn.resourcemanager.ha.automatic-failover.enabled | Enable automatic failover; +| | By default, it is enabled only when HA is enabled. +*---------------+--------------+ +| yarn.resourcemanager.ha.automatic-failover.embedded | Use embedded leader-elector +| | to pick the Active RM, when automatic failover is enabled. By default, +| | it is enabled only when HA is enabled. +*---------------+--------------+ +| yarn.resourcemanager.cluster-id | Identifies the cluster. Used by the elector to +| | ensure an RM doesn't take over as Active for another cluster. +*---------------+--------------+ + +** Admin commands + + yarn rmadmin has a few HA-specific commands to check the health/state of an + RM, and transition to Active/Standby. See + {{{./YarnCommands.html}YarnCommands}} for more details. + +** Web UI + + The Standby automatically redirects to the Active, except for the "About" page. + +** Web Services + + The web services automatically redirect to the Active. + + +