diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm index cb34c3f6a14..80b644f7420 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm @@ -28,7 +28,7 @@ YARN allows applications to specify placement constraints in the form of data lo For example, it may be beneficial to co-locate the allocations of a job on the same rack (*affinity* constraints) to reduce network costs, spread allocations across machines (*anti-affinity* constraints) to minimize resource interference, or allow up to a specific number of allocations in a node group (*cardinality* constraints) to strike a balance between the two. Placement decisions also affect resilience. For example, allocations placed within the same cluster upgrade domain would go offline simultaneously. -The applications can specify constraints without requiring knowledge of the underlying topology of the cluster (e.g., one does not need to specify the specific node or rack where their containers should be placed with constraints) or the other applications deployed. Currently **intra-application** constraints are supported, but the design that is followed is generic and support for constraints across applications will soon be added. Moreover, all constraints at the moment are **hard**, that is, if the constraints for a container cannot be satisfied due to the current cluster condition or conflicting constraints, the container request will remain pending or get will get rejected. +The applications can specify constraints without requiring knowledge of the underlying topology of the cluster (e.g., one does not need to specify the specific node or rack where their containers should be placed with constraints) or the other applications deployed. Currently, all constraints at the moment are **hard**, that is, if the constraints for a container cannot be satisfied due to the current cluster condition or conflicting constraints, the container request will remain pending or get rejected. Note that in this document we use the notion of “allocation” to refer to a unit of resources (e.g., CPU and memory) that gets allocated in a node. In the current implementation of YARN, an allocation corresponds to a single container. However, in case an application uses an allocation to spawn more than one containers, an allocation could correspond to multiple containers. @@ -49,7 +49,7 @@ To enable placement constraints, the following property has to be set to `placem We now give more details about each of the three placement constraint handlers: * `placement-processor`: Using this handler, the placement of containers with constraints is determined as a pre-processing step before the capacity or the fair scheduler is called. Once the placement is decided, the capacity/fair scheduler is invoked to perform the actual allocation. The advantage of this handler is that it supports all constraint types (affinity, anti-affinity, cardinality). Moreover, it considers multiple containers at a time, which allows to satisfy more constraints than a container-at-a-time approach can achieve. As it sits outside the main scheduler, it can be used by both the capacity and fair schedulers. Note that at the moment it does not account for task priorities within an application, given that such priorities might be conflicting with the placement constraints. -* `scheduler`: Using this handler, containers with constraints will be placed by the main scheduler (as of now, only the capacity scheduler supports SchedulingRequests). It currently supports anti-affinity constraints (no affinity or cardinality). The advantage of this handler, when compared to the `placement-processor`, is that it follows the same ordering rules for queues (sorted by utilization, priority), apps (sorted by FIFO/fairness/priority) and tasks within the same app (priority) that are enforced by the existing main scheduler. +* `scheduler`: Using this handler, containers with constraints will be placed by the main scheduler (as of now, only the capacity scheduler supports SchedulingRequests). The advantage of this handler, when compared to the `placement-processor`, is that it follows the same ordering rules for queues (sorted by utilization, priority), apps (sorted by FIFO/fairness/priority) and tasks within the same app (priority) that are enforced by the existing main scheduler. * `disabled`: Using this handler, if a SchedulingRequest is asked by an application, the corresponding allocate call will be rejected. The `placement-processor` handler supports a wider range of constraints and can allow more containers to be placed, especially when applications have demanding constraints or the cluster is highly-utilized (due to considering multiple containers at a time). However, if respecting task priority within an application is important for the user and the capacity scheduler is used, then the `scheduler` handler should be used instead. @@ -65,15 +65,16 @@ $ yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/ha where **PlacementSpec** is of the form: ``` -PlacementSpec => "" | KeyVal;PlacementSpec -KeyVal => SourceTag=Constraint -SourceTag => String -Constraint => NumContainers | NumContainers,"IN",Scope,TargetTag | NumContainers,"NOTIN",Scope,TargetTag | NumContainers,"CARDINALITY",Scope,TargetTag,MinCard,MaxCard -NumContainers => int -Scope => "NODE" | "RACK" -TargetTag => String -MinCard => int -MaxCard => int +PlacementSpec => "" | KeyVal;PlacementSpec +KeyVal => SourceTag=Constraint +SourceTag => String +Constraint => NumContainers, SingleConstraint | NumContainers, AND(SingleConstraint:SingleConstraint) | NumContainers, OR(SingleConstraint:SingleConstraint) | +SingleConstraint => "IN",Scope,TargetTag | "NOTIN",Scope,TargetTag | "CARDINALITY",Scope,TargetTag,MinCard,MaxCard +NumContainers => int +Scope => "NODE" | "RACK" +TargetTag => String +MinCard => int +MaxCard => int ``` Note that when the `-placement_spec` argument is specified in the distributed shell command, the `-num-containers` argument should not be used. In case `-num-containers` argument is used in conjunction with `-placement-spec`, the former is ignored. This is because in PlacementSpec, we determine the number of containers per tag, making the `-num-containers` redundant and possibly conflicting. Moreover, if `-placement_spec` is used, all containers will be requested with GUARANTEED execution type. @@ -87,6 +88,11 @@ The above encodes two constraints: * place 5 containers with tag "hbase" with affinity to a rack on which containers with tag "zk" are running (i.e., an "hbase" container should not be placed at a rack where an "zk" container is running, given that "zk" is the TargetTag of the second constraint); * place 7 container with tag "spark" in nodes that have at least one, but no more than three, containers, with tag "hbase". +Another example below demonstrates a composite form of constraint: +``` +zk=5,AND(IN,RACK,hbase:NOTIN,NODE,zk) +``` +The above constraint uses a conjunction operator `AND` to combine 2 constraints together, this constraint is satisfied when both child constraints are satisfied. This constraint means to place 5 "zk" containers in the same rack that has "hbase" running, and no more than 1 "zk" can be placed on one single node. Similarly, `OR` operator is supported too. Note, in real scenario, zookeeper and hbase are different applications, hence the allocation tags must be declared with namespaces (see [Allocation tags namespace](#Allocation_tags_namespace)) in order to interpret inter-app constraints. Defining Placement Constraints @@ -98,11 +104,24 @@ Allocation tags are string tags that an application can associate with (groups o Note that instead of using the `ResourceRequest` object to define allocation tags, we use the new `SchedulingRequest` object. This has many similarities with the `ResourceRequest`, but better separates the sizing of the requested allocations (number and size of allocations, priority, execution type, etc.), and the constraints dictating how these allocations should be placed (resource name, relaxed locality). Applications can still use `ResourceRequest` objects, but in order to define allocation tags and constraints, they need to use the `SchedulingRequest` object. Within a single `AllocateRequest`, an application should use either the `ResourceRequest` or the `SchedulingRequest` objects, but not both of them. +$H4 Allocation tags namespace + +Allocation tags can be binding with a certain namespace, namespace is used to define the effective scope of given allocation tags when they are checked accordingly to a placement constraint. By defining the namespace, it restricts if a constraint is checked in one application, a certain group of applications or globally in the cluster. Following namespaces are supported: + +| Namespace | Syntax | Description | +|:--------- |:-------|:------------| +| SELF | `self/${allocationTag}` | Placement constraint will only be checked against the allocation tag within current application. | +| NOT_SELF | `not-self/${allocationTag}` | Placement constraint will only be checked against the allocation tag of all applications except current application. | +| ALL | `all/${allocationTag}` | Placement constraint will be checked against the allocation tag of all applications. | +| APP_ID | `app-id/${applicationID}/${allocationTag}` | Placement constraint will only be checked against the allocation tag from the specific application identified by the application ID. | +| APP_TAG | `app-tag/application_tag_name/${allocationTag}` | Placement constraint will only be checked against the allocation tag from applications with specified application tag. | + +If namespace is not given, by default `SELF` is binding to the allocation tags, this means the placement constraint with these tags will be interpreted as **intra-app** constraints. Other types of namespaces provide the flexibility to define tags for **inter-app** constraints. + $H4 Differences between node labels, node attributes and allocation tags The difference between allocation tags and node labels or node attributes (YARN-3409), is that allocation tags are attached to allocations and not to nodes. When an allocation gets allocated to a node by the scheduler, the set of tags of that allocation are automatically added to the node for the duration of the allocation. Hence, a node inherits the tags of the allocations that are currently allocated to the node. Likewise, a rack inherits the tags of its nodes. Moreover, similar to node labels and unlike node attributes, allocation tags have no value attached to them. As we show below, our constraints can refer to allocation tags, as well as node labels and node attributes. - $H3 Placement constraints API Applications can use the public API in the `PlacementConstraints` to construct placement constraint. Before describing the methods for building constraints, we describe the methods of the `PlacementTargets` class that are used to construct the target expressions that will then be used in constraints: @@ -110,7 +129,7 @@ Applications can use the public API in the `PlacementConstraints` to construct p | Method | Description | |:------ |:----------- | | `allocationTag(String... allocationTags)` | Constructs a target expression on an allocation tag. It is satisfied if there are allocations with one of the given tags. | -| `allocationTagToIntraApp(String... allocationTags)` | similar to `allocationTag(String...)`, but targeting only the containers of the application that will use this target (intra-application constraints). | +| `allocationTagWithNamespace(String namespace, String... allocationTags)` | Similar to `allocationTag(String...)`, but allows to specify a namespace for the given allocation tags. | | `nodePartition(String... nodePartitions)` | Constructs a target expression on a node partition. It is satisfied for nodes that belong to one of the `nodePartitions`. | | `nodeAttribute(String attributeKey, String... attributeValues)` | Constructs a target expression on a node attribute. It is satisfied if the specified node attribute has one of the specified values. | @@ -136,4 +155,4 @@ Applications have to specify the containers for which each constraint will be en When using the `placement-processor` handler (see [Enabling placement constraints](#Enabling_placement_constraints)), this constraint mapping is specified within the `RegisterApplicationMasterRequest`. -When using the `scheduler` handler, the constraints can also be added at each `SchedulingRequest` object. Each such constraint is valid for the tag of that scheduling request. In case constraints are specified both at the `RegisterApplicationMasterRequest` and the scheduling requests, the latter override the former. +When using the `scheduler` handler, the constraints can also be added at each `SchedulingRequest` object. Each such constraint is valid for the tag of that scheduling request. In case constraints are specified both at the `RegisterApplicationMasterRequest` and the scheduling requests, the latter override the former. \ No newline at end of file