diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md index f5055d9f12a..7d515c87387 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md @@ -46,9 +46,9 @@ and matching the DNS-compatible path naming scheme. Examples: nodes, HBase region servers and HBase REST servers. **Service Instance:** A single instance of an application. Example, an HBase -cluster `demo1`. A service instance is running if the instances the components -which for the service are running. This does not imply "live" in the -distributed computing sense, merely that the process are running. +cluster `demo1`. A service instance is running only if the component instances + for the service are running. This does not imply "live" in the +distributed computing sense, merely meaning that the processes are running. **Component Instance**: a single instance of a component within a service instance. Examples: an HBase master node on host `rack1server6` or a region @@ -84,8 +84,8 @@ container ID. ## The binding problem Hadoop YARN allows applications to run on the Hadoop cluster. Some of these are -batch jobs or queries that can managed via Yarn’s existing API using its -application ID. In addition YARN can deploy ong-lived services instances such a +batch jobs or queries that can be managed via Yarn’s existing API using its +application ID. In addition YARN can deploy long-lived services instances such as a pool of Apache Tomcat web servers or an Apache HBase cluster. YARN will deploy them across the cluster depending on the individual each component requirements and server availability. These service instances need to be discovered by @@ -97,8 +97,8 @@ As a result there is no easy way for clients to interact with dynamically deployed applications. YARN supports a rudimentary registry which allows YARN Application Masters to -register a web URL and an IPC address. but is not sufficient for our purposes -since it It does not allow any other *endpoints* to be registered —such as REST +register a web URL and an IPC address. It is not sufficient for our purposes, however, +as it does not allow any other *endpoints* to be registered—such as REST URLs, or zookeeper path or the endpoints of the tasks that the Application Master executes. Further, information that can be registered is mapped to the YARN application instance —a unique instance ID that changes whenever a YARN @@ -130,7 +130,7 @@ Yarn-deployed services belonging to individual users. 1. A Hadoop core service that is not running under YARN example: HDFS) can be registered in for discovery. This could be done by the service or by management -tools.. +tools. 2. A long-lived application deployed by YARN registers itself for discovery by clients. The registration data is intended to outlive the application master, @@ -196,7 +196,7 @@ locate them. Allow dynamic registration of service instances - * YARN deployed services instances must be able register their bindings and be + * YARN deployed services instances must be able to register their bindings and be discovered by clients. * Core Hadoop service instances must be able to register their service @@ -207,14 +207,14 @@ Allow dynamic registration of service instances * A service instance must be able to publish a variety of endpoints for a service: Web UI, RPC, REST, Zookeeper, others. Furthermore one must also be - able register certificates and other public security information may be + able to register certificates, and other public security information may be published as part of a binding. Registry service properties: * The registry must be highly available. - * Scale: many services and many clients in a large cluster. This will limit + * Scale: many services and many clients can coexist in a large cluster. This will limit how much data a service can publish. * Ubiquity: we need this in every YARN cluster, whether physical, virtual or @@ -255,7 +255,7 @@ Remote accessibility: supports remote access even on clusters which are We propose a base registry service that binds string-names to records describing service and component instances. We plan to use ZK as the base name -service since it supports many of the properties, We pick a part of the ZK +service since it supports many of the properties. We pick a part of the ZK namespace to be the root of the service registry ( default: `yarnRegistry`). On top this base implementation we build our registry service API and the @@ -286,7 +286,7 @@ protocols exported by that service instance. type is `URL`, `protocol==IPC` binding uses the addresstype `host/port`. 4. The *api*. This is the API offered by the endpoint, and is application - specific. examples: `org.apache.hadoop.namenode`, + specific. Examples: `org.apache.hadoop.namenode`, `org.apache.hadoop.webhdfs` 5. Endpoints may be *external* —for use by programs other than the service @@ -313,7 +313,7 @@ service class names, instance. For a YARN-deployed application, this can be trivially derived from the container ID. -The requirements for unique names ensures that the path to a service instance +The requirements for unique names ensure that the path to a service instance or component instance is guaranteed to be unique, and that all instances of a specific service class can be enumerated by listing all children of the service class path. @@ -439,7 +439,7 @@ The policies which clean up when an application, application attempt or container terminates require the `yarn:id` field to match that of the application, attempt or container. If the wrong ID is set, the cleanup does not take place —and if set to a different application or container, will be cleaned -up according the lifecycle of that application. +up according to the lifecycle of that application. ### Endpoint: @@ -529,7 +529,7 @@ The following strategies are suggested to provide unique URIs for an API 1. The SOAP/WS-* convention of using the URL to where the WSDL defining the service 2. A URL to the svn/git hosted document defining a REST API -3. the `classpath` schema followed by a path to a class or package in an application. +3. The `classpath` schema followed by a path to a class or package in an application. 4. The `uuid` schema with a generated UUID. It is hoped that standard API URIs will be defined for common APIs. Two such non-normative APIs are used in this document @@ -824,16 +824,16 @@ The `RegistryPathStatus` class summarizes the contents of a node in the registry The registry will allow a service instance can only be registered under the path where it has permissions. Yarn will create directories with appropriate -permissions for users where Yarn deployed services can be registered by a user. +permissions for users where Yarn deployed services can be registered by a user of the user account of the service instance. The admin will also create directories (such as `/services`) with appropriate permissions (where core Hadoop -services can register themselves. +services can register themselves). -There will no attempt to restrict read access to registry information. The +There will be no attempt to restrict read access to registry information. The services will protect inappropriate access by clients by requiring authentication and authorization. There is a *scope* field in a service record , but this is just a marker to say "internal API only", rather than a direct -security restriction. (this is why "internal" and "external" are proposed, not +security restriction. (This is why "internal" and "external" are proposed, not "public" and "private"). Rationale: the endpoints being registered would be discoverable through port @@ -852,12 +852,11 @@ In an a non-Kerberos Zookeeper Cluster, no security policy is implemented. The registry is designed to be secured *on a kerberos-managed cluster*. * The registry root grants full rights to "system accounts": -`mapred`, `hdfs`, `yarn` : `"rwcda"`; all other accounts, and anonymous access -is read-only. +`mapred`, `hdfs`, `yarn` : `"rwcda"`; all other accounts and anonymous access +are read-only. -* The permissions are similarly restricted for `/users`, and `/services/` - -* installations may extend or change these system accounts. +* The permissions are similarly restricted for `/users`. `/services/` +installations may extend or change these system accounts. * When an application belonging to a user is scheduled, YARN SHALL create an entry for that user `/users/${username}`. @@ -867,17 +866,17 @@ SHALL create an entry for that user `/users/${username}`. their home node, —or alter its permissions. * Applications wishing to write to the registry must use a SASL connection -to authenticate via Zookeeper, +to authenticate via Zookeeper. * Applications creating nodes in the user path MUST include the site-specified system accounts in the ACL list, with full access. -* Applications creating nodes in the user path MUST include an ACL Which +* Applications creating nodes in the user path MUST include an ACL. * Applications creating nodes in the user path MUST declare their own user identity as a `sasl:user@REALM` entry. -* Applications creating nodes the user path MAY add extra `digest:` ACL tokens +* Applications creating nodes in the user path MAY add extra `digest`: ACL tokens, so as to give their services the ability to manipulate portions of the registry *without needing kerberos credentials*. @@ -1028,6 +1027,6 @@ and cached until communications problems occur. At that point the registry is queried for the current record, then an attempt is made to reconnect to the AM. Here "connectivity" problems means both "low level socket/IO errors" and -"failures in HTTPS authentication". The agents use two-way HTTPS authentication -—if the AM fails and another application starts listening on the same ports +"failures in HTTPS authentication". The agents use two-way HTTPS authentication. +If the AM fails and another application starts listening on the same ports, it will trigger an authentication failure and hence service record reread.