diff --git hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm new file mode 100644 index 0000000..591bff2 --- /dev/null +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm @@ -0,0 +1,200 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + ResourceManger Restart + --- + --- + ${maven.build.timestamp} + +ResourceManger Restart + +%{toc|section=1|fromDepth=0} + +* {Overview} + + ResourceManager is the central authority that manages resources and schedules + applications running atop of YARN. Hence, it is a single point of failure in the + Apache YARN cluster. + + This document gives an overview of ResourceManager Restart, a feature that + enhances ResourceManager to keep functioning across restarts and also makes + ResourceManager down-time transparent to end-users. + + ResourceManager Restart feature is divided into two phases: + + ResourceManager Restart Phase 1: Enhance RM to persist application/attempt state + and other credentials information in a pluggable state-store. RM will reload + this information from state-store upon restart and re-kick the previously + running applications. Users are not required to re-submit the applications. + + ResourceManager Restart Phase 2: + Focus on re-constructing the running state of ResourceManger by reading back + the container statuses from NodeMangers and container requests from ApplicationMasters + upon restart. The key difference from phase 1 is that previously running applications + will not be killed after RM restarts, and so applications won't lose its work + because of RM outage. + + This document only describes ResourceManager Restart Phase 1. + +* {Feature} + + The overall concept is that RM will persist the ApplicationSubmissionContext in + a pluggable state-store when client submits an application and also saves the final status + of the application such as the completion state (failed, killed, finished) + and the diagnostics etc when application completes. Besides, RM also saves + the credentials like security keys, tokens to work in a secure environment. + Any time RM shuts down, as long as the required information (i.e. ApplicationSubmissionContext + and the alongside credentials if running in a secure environment) is persisted, RM can pick up the + application from the state-store and kick off the application as usual if the application + is not yet completed (i.e. failed, killed, finished). + + NodeMangers and clients during the down-time of RM will keep retrying until + RM comes up. RM will send a re-sync command to all the NodeMangers and ApplicationMasters + it were talking with. Today, the behavior for NodeMangers and ApplicationMasters on such + notification are: NMs will kill its running containers and re-register with RM. + AMs will just be killed by such re-sync command. RM, upon restart, will create a new + attempt (i.e. ApplicationMaster) based on the ApplicationSubmissionContext read + from state-store and also load the credentials if running in a secure environment. After that, + it schedules the application to run as usual. As known, the previously running + applications' work is lost in this manner. + +* {Configurations} + + This section describes the configurations involved to enable RM Restart feature. + + * Enable ResourceManager Restart functionality. + + To enable RM Restart functionality, set the following property in <> to true: + +*--------------------------------------+--------------------------------------+ +|| Property || Value | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | <<>> | +*--------------------------------------+--------------------------------------+ + + + * Configure the state-store that is used to persist the RM state. + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | The class name of the state-store to be used for saving application/attempt | +| | state and the credentials. The available state-store implementations are | +| | <<>> | +| | , a ZooKeeper based state-store implementation and | +| | <<>> | +| | , a Hadoop FileSystem based state-store implementation like HDFS. | +| | The default value is set to | +| | <<>>. | +*--------------------------------------+--------------------------------------+ + + * Configures when using Hadoop FileSystem based state-store implementation. + + Configure the URI where the RM state will be saved in the Hadoop FileSystem state-store. + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | URI pointing to the location of the FileSystem path where RM state will | +| | be stored (e.g. hdfs://localhost:9000/rmstore). | +| | Default value is <<<${hadoop.tmp.dir}/yarn/system/rmstore>>>. | +| | If FileSystem name is not provided, <<>> specified in | +| | <> will be used. | +*--------------------------------------+--------------------------------------+ + + Configure the retry policy state-store client uses to connect with the Hadoop + FileSystem. + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | Hadoop FileSystem client retry policy specification. Hadoop FileSystem client retry | +| | is always enabled. Specified in pairs of sleep-time and number-of-retries | +| | i.e. (t0, n0), (t1, n1), ..., the first n0 retries sleep t0 milliseconds on | +| | average, the following n1 retries sleep t1 milliseconds on average, and so on. | +| | Default value is (2000, 500) | +*--------------------------------------+--------------------------------------+ + + * Configures when using ZooKeeper based state-store implementation. + + Configure the ZooKeeper server address and the root path where the RM state is stored. + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | Comma separated list of Host:Port pairs. Each corresponds to a ZooKeeper server | +| | (e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002") to be used by the RM | +| | for storing RM state. | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | The full path of the root znode where RM state will be stored. | +| | Default value is /rmstore. | +*--------------------------------------+--------------------------------------+ + + Configure the retry policy state-store client uses to connect with the ZooKeeper server. + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | Number of times RM tries to connect to ZooKeeper server if the connection is lost. | +| | Default value is 500. | +*--------------------------------------+--------------------------------------+ +| <<>> | +| | The interval in milliseconds between retries when connecting to a ZooKeeper server. | +| | Default value is 2 seconds. | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | ZooKeeper session timeout in milliseconds. This configuration is used by | +| | the ZooKeeper server to determine when the session expires. Session expiration | +| | happens when the server does not hear from the client (i.e. no heartbeat) within the session | +| | timeout period specified by this configuration. Default | +| | value is 10 seconds | +*--------------------------------------+--------------------------------------+ + + Configure the ACLs to be used for setting permissions on ZooKeeper znodes. + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | ACLs to be used for setting permissions on ZooKeeper znodes. Default value is <<>> | +*--------------------------------------+--------------------------------------+ + + * Configure the max number of application attempt retries. + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | The maximum number of application attempts. It's a global | +| | setting for all application masters. Each application master can specify | +| | its individual maximum number of application attempts via the API, but the | +| | individual number cannot be more than the global upper bound. If it is, | +| | the RM will override it. The default number is set to 2, to | +| | allow at least one retry for AM. | +*--------------------------------------+--------------------------------------+ + + This configuration's impact is in fact beyond RM restart scope. It controls + the max number of attempts an application can have. In RM Restart Phase 1, + this configuration is needed since as described earlier each time RM restarts, + it kills the previously running attempt (i.e. ApplicationMaster) and + creates a new attempt. Therefore, each occurence of RM restart causes the + attempt count to increase by 1. In RM Restart phase 2, this configuration is not + needed since the previously running ApplicationMaster will + not be killed and the AM will just re-sync back with RM after RM restarts.