diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md index 33d2b13e06f..0afd4e1c3bb 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md @@ -523,3 +523,45 @@ Updating a Container (Experimental - API may change in the future) The **DECREASE_RESOURCE** and **DEMOTE_EXECUTION_TYPE** container updates are automatic - the AM does not explicitly have to ask the NM to decrease the resources of the container. The other update types require the AM to explicitly ask the NM to update the container. If the **yarn.resourcemanager.auto-update.containers** configuration parameter is set to **true** (false by default), The RM will ensure that all container updates are automatic. + +Activities +-------------------- + + Users often want to know what happened when encountering unexpected allocation result, e.g., application can't get all required resources. There are so many possible causes (such as queue limit reached, user limit reached, node mismatched because of insufficient resource or unacceptable placement constraints.) that it's difficult to locate the true cause. + Activities in Capacity Scheduler provide a way to collect internal activities with diagnostic information in scheduler which can help solving the problem and expose them for users, support asynchronous scheduling mode and multi-node lookup mechanism, cover most diagnostics of multiple levels(queue/app/request/node) in scheduling process and can show details for every node or aggregation summary for node set with the same allocation state and diagnostic. + +### Scheduler Activities + + Scheduler activities are focus on all activities in a single scheduling process, which can help users to understand what happened when encountering unexpected scheduling result. + The following steps show how scheduler activities works: + * Users request scheduler activities via REST API, turn on scheduler activities recording for specified node and then take the needed and completed scheduler activities from cache as the response. If there is no specified node, target node in next scheduling process which is a real node or empty node (in multi-node mode) will be taken as the specified node. + * Allocation process will record scheduler activities of the specified node in a single scheduling process, then put them into the cache of completed scheduler activities. + * Cleanup thread in ActivitiesManager periodically clean up expired scheduler activities which have lived for a certain time(10 minutes by default). + See the [YARN Resource Manager REST API](ResourceManagerRest.html#Scheduler_Activities_API) for examples on how to query scheduler activities via REST. + +### Application Activities + + Application activities are focus on a specified application, with which users can trace recent scheduling processes about the specified application. + The following steps show how application activities works: + * Users request application activities via REST API, turn on activities recording for specified application in later scheduling processes for a certain duration and then take the needed and completed application activities from cache as the response. + * Allocation process keep recording activities of the specified application until the duration is reached, then put them into the cache of completed application activities. + * Cleanup thread in ActivitiesManager periodically clean up expired application activities which have lived for a certain time(10 minutes by default). + See the [YARN Resource Manager REST API](ResourceManagerRest.html#Application_Activities_API) for examples on how to query application activities via REST. + +### Cleanup Configuration + + The CapacityScheduler supports the following parameters to control how the expired scheduler/application activities can be cleaned up. + +| Property | Description | +|:---- |:---- | +| `yarn.resourcemanager.activities-manager.cleanup-interval-ms` | The cleanup interval for activities in milliseconds. Defaults to 5000. | +| `yarn.resourcemanager.activities-manager.scheduler-activities.ttl-ms` | Time to live for scheduler activities in milliseconds. Defaults to 600000. | +| `yarn.resourcemanager.activities-manager.app-activities.ttl-ms` | Time to live for scheduler activities in milliseconds. Defaults to 600000. | +| `yarn.resourcemanager.activities-manager.app-activities.max-queue-length` | Max queue length for app activities. Defaults to 1000. | + +### User Guide + + When encountering unexpected results such as application can't get required resources, we can query application or scheduler activities to help figuring out why as follows: + * Request application activities REST API with specified application id, for example: http://rm-http-address:port/ws/v1/cluster/scheduler/app-activities?appId=application_1557733202578_0001, this operation will trigger recording for the specified application and then query from cache, at first time there will be no app activities so that the response will contains message: "waiting for display". + * Request again after a while and this time cache may have activities of the specified application then return application activities in the response. + * If there is still no activities of the specified application, that represents this application haven't been considered by the scheduler since the scheduling process have finished before or skipped its queue. We can request scheduler activities to see what happened in a single scheduling process via scheduler activities REST API, for example: http://rm-http-address:port/ws/v1/cluster/scheduler/activities. diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md index 54a692e6cc7..29657ac78db 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md @@ -5291,3 +5291,452 @@ Response Header: HTTP/1.1 200 OK Content-Type: application/xml Transfer-Encoding: chunked + + +Scheduler Activities API +-------------------------------- + +The scheduler activities API currently supports Capacity Scheduler and provides a way to get scheduler activities in a single scheduling process, it will trigger recording scheduler activities in next scheduling process and then take last required scheduler activities from cache as the response. The response have hierarchical structure with multiple levels and important scheduling details which are organized by the sequence of scheduling process: + * **Hierarchical Queues** - Hierarchical Queues are those for which the scheduler have been tried to allocate, each of them contains queue name, allocation state, optional diagnostic and optional children. + * **Applications** - Applications are shown as children of leaf queue, each of them contains application name, app priority, allocation state, optional diagnostic and optional children. + * **Requests** - Requests are shown as children of application, each of them contains request name, request priority, allocation request id, allocation state and optional children. + * **Nodes** - Nodes are shown as children of request, each of them contains node id, allocation state, optional name which should appear after allocating or reserving a container on the node, and optional diagnostic which should appear if failed to allocate or reserve a container on the node. For aggregated nodes grouped by allocation state and diagnostic, each of them contains allocation state, aggregated node ids and optional diagnostic. + +### URI + + * http://rm-http-address:port/ws/v1/cluster/scheduler/activities + +### HTTP Operations Supported + + * GET + +### Query Parameters Supported + +Multiple parameters can be specified for GET operations. + + * nodeId - the node id of the node on which we want to collect the scheduler activities, if this parameter is not specified, the node id in next allocation will be taken as the recording target. + * groupingType - the grouping type used to aggregate scheduler activities, which is very useful in multi-node scheduling scenario, currently only support STATE_AND_DIAGNOSTIC with which user can query aggregated activities grouped by allocation state and diagnostic. + +### Elements of the *Queue* object + +| Item | Data Type | Description | +|:---- |:---- |:---- | +| name | string | Name of the queue | +| allocationState | string | Final allocation state of the queue (SKIPPED, ALLOCATED, RESERVED, ALLOCATED_FROM_RESERVED, etc) | +| diagnostic | string | Diagnostic about queue | +| children | array of applications(JSON)/application objects(XML) | A collection of application objects | + +### Elements of the *Application* object + +| Item | Data Type | Description | +|:---- |:---- |:---- | +| name | string | Name of the application | +| appPriority | string | Priority of the application | +| allocationState | string | Final allocation state of the application (SKIPPED, ACCEPTED, etc) | +| diagnostic | string | Diagnostic about application | +| children | array of requests(JSON)/request objects(XML) | A collection of request objects | + +### Elements of the *Request* object + +| Item | Data Type | Description | +|:---- |:---- |:---- | +| name | string | Name of the request, the value format is "request__" which can be used to identify differnet requests. | +| requestPriority | string | Priority of the request | +| allocationRequestId | string | Allocation request id of the request | +| allocationState | string | Final allocation state of the request (SKIPPED, ALLOCATED, RESERVED, ALLOCATED_FROM_RESERVED, etc) | +| diagnostic | string | Diagnostic about request | +| children | array of nodes(JSON)/node objects(XML) | A collection of node objects | + +### Elements of the *Node* object + +| Item | Data Type | Description | +|:---- |:---- |:---- | +| name | string | Container information which is optional and can be shown when allocation state is ALLOCATED, RESERVED or ALLOCATED_FROM_RESERVED. | +| allocationState | string | Final allocation state of the node (SKIPPED, ALLOCATED, RESERVED, ALLOCATED_FROM_RESERVED, etc) | +| diagnostic | string | Diagnostic about node in normal mode or nodes in aggregation mode | +| nodeId | string | The node id of the node which this activity belongs to | +| nodeIds | array of string(JSON)/string objects(XML) | The node id list of the nodes which this kind of activity belongs to in aggregation mode | +| count | int | The amount of nodes which this kind of activities belongs to in aggregation mode | + + +### Response Examples + +**JSON response** + +HTTP Request: + + Accept: application/json + GET http://rm-http-address:port/ws/v1/cluster/scheduler/activities + +Response Header: + + HTTP/1.1 200 OK + Content-Type: application/json + Transfer-Encoding: chunked + Server: Jetty(6.1.26) + +Response Body: + +```json +{ + "timeStamp": "Sun May 12 11:21:21 CST 2019", + "allocations": { + "finalAllocationState": "RESERVED", + "root": { + "name": "root", + "allocationState": "ACCEPTED", + "children": [ + { + "name": "a", + "allocationState": "SKIPPED", + "diagnostic": "Queue does not need more resource" + }, + { + "name": "b", + "allocationState": "ACCEPTED", + "children": [ + { + "name": "application_1557631278829_0001", + "appPriority": "0", + "allocationState": "SKIPPED", + "diagnostic": "Application does not need more resource" + }, + { + "name": "application_1557631278829_0002", + "appPriority": "0", + "allocationState": "ACCEPTED", + "children": { + "name": "request_1_-1", + "requestPriority": "1", + "allocationState": "RESERVED", + "allocationRequestId": "-1", + "children": [ + { + "allocationState": "SKIPPED", + "diagnostic": "Node does not have sufficient resource for request, insufficient resources=[memory-mb]\nrequired=, available=", + "nodeId": "127.0.0.3:1234" + }, + { + "allocationState": "SKIPPED", + "diagnostic": "Node does not have sufficient resource for request, insufficient resources=[memory-mb]\nrequired=, available=", + "nodeId": "127.0.0.4:1234" + }, + { + "allocationState": "SKIPPED", + "diagnostic": "Node does not have sufficient resource for request, insufficient resources=[memory-mb]\nrequired=, available=", + "nodeId": "127.0.0.2:1234" + }, + { + "name": "Container: [ContainerId: null, AllocationRequestId: -1, Version: 0, NodeId: 127.0.0.1:1234, NodeHttpAddress: 127.0.0.1:2, Resource: , Priority: 1, Token: null, ExecutionType: GUARANTEED, ]", + "allocationState": "RESERVED", + "nodeId": "127.0.0.1:1234" + } + ] + } + } + ] + } + ] + } + } +} +``` + +**XML response** + +HTTP Request: + + Accept: application/xml + GET http://rm-http-address:port/ws/v1/cluster/scheduler/activities + +Response Header: + + HTTP/1.1 200 OK + Content-Type: application/xml; charset=utf-8 + Transfer-Encoding: chunked + +Response Body: + + +```xml + + + Sun May 12 12:02:43 CST 2019 + + RESERVED + + root + ACCEPTED + + a + SKIPPED + Queue does not need more resource + + + b + ACCEPTED + + application_1557633761370_0001 + 0 + SKIPPED + Application does not need more resource + + + application_1557633761370_0002 + 0 + ACCEPTED + + request_1_-1 + 1 + RESERVED + -1 + + SKIPPED + Node does not have sufficient resource for request, insufficient resources=[memory-mb] required=<memory:4096, vCores:1>, available=<memory:2048, vCores:2> + 127.0.0.3:1234 + + + SKIPPED + Node does not have sufficient resource for request, insufficient resources=[memory-mb] required=<memory:4096, vCores:1>, available=<memory:2048, vCores:2> + 127.0.0.4:1234 + + + SKIPPED + Node does not have sufficient resource for request, insufficient resources=[memory-mb] required=<memory:4096, vCores:1>, available=<memory:2048, vCores:2> + 127.0.0.2:1234 + + + Container: [ContainerId: null, AllocationRequestId: -1, Version: 0, NodeId: 127.0.0.1:1234, NodeHttpAddress: 127.0.0.1:2, Resource: <memory:4096, vCores:1>, Priority: 1, Token: null, ExecutionType: GUARANTEED, ] + RESERVED + 127.0.0.1:1234 + + + + + + + +``` + + +Application Activities API +-------------------------------- + + The application activities API currently supports Capacity Scheduler only and provides a way to get application activities, it will trigger recording activities for specified application in specified duration(For example, 3 seconds) from now on and then take required activities from cache as the response. The response have hierarchical structure with multiple levels: + * **Applications** - Each application contains application id and allocations which are scheduling attempts in different scheduling processes and organized in decreasing timestamp order. + * **Application Scheduling Attempts** - Each application scheduling attempt contains queue name, app priority, timestamp, datetime, allocation state, optional diagnostic and optional request allocation. + * **Requests** - Requests are shown as children of application scheduling attempt, each of them contains request name, request priority, allocation request id, allocation state and optional children. + * **Nodes** - Nodes are shown as children of request, each of them contains node id, allocation state, optional name which should appear after allocating or reserving a container on the node, and optional diagnostic which should appear if failed to allocate or reserve a container on the node. For aggregated nodes grouped by allocation state and diagnostic, each of them contains allocation state, aggregated node ids and optional diagnostic. + +### URI + + * http://rm-http-address:port/ws/v1/cluster/scheduler/app-activities + +### HTTP Operations Supported + + * GET + +### Query Parameters Supported + +Multiple parameters can be specified for GET operations. + + * appId - the application id. + * maxTime - the max duration in seconds from now on for recording application activities. If not specified, this will default to 3 (seconds). + * requestPriorities - the priorities of request, specified as a comma-separated list. + * allocationRequestIds - the allocation request ids of request, specified as a comma-separated list. + * groupingType - the grouping type used to aggregate app activities, which is very useful in multi-node scheduling scenario, currently only support STATE_AND_DIAGNOSTIC with which user can query aggregated activities grouped by allocation state and diagnostic. + +### Elements of the *Application* object + +| Item | Data Type | Description | +|:---- |:---- |:---- | +| applicationId | string | | +| allocations | array of application attempts(JSON)/application attempt objects(XML) | A collection of application attempts | + + +### Elements of the *Application Scheduling Attempt* object + +| Item | Data Type | Description | +|:---- |:---- |:---- | +| queueName | string | Name of the queue | +| appPriority | string | Priority of the application | +| timestamp | string | Timestamp of the application scheduling attempt | +| dateTime | string | Date time of the application scheduling attempt | +| allocationState | string | Final allocation state of the application (SKIPPED, ALLOCATED, RESERVED, ALLOCATED_FROM_RESERVED, etc) | +| requestAllocation | array of requests(JSON)/request objects(XML) | A collection of request objects | + +### Elements of the *Request* object + +| Item | Data Type | Description | +|:---- |:---- |:---- | +| requestPriority | string | Priority of the request | +| allocationRequestId | string | Allocation request id of the request | +| allocationState | string | Final allocation state of the request (SKIPPED, ALLOCATED, RESERVED, ALLOCATED_FROM_RESERVED, etc) | +| allocationAttempt | array of nodes(JSON)/node objects(XML) | A collection of node objects | + +### Elements of the *Node* object + +| Item | Data Type | Description | +|:---- |:---- |:---- | +| name | string | Container information which is optional and can be shown when allocation state is ALLOCATED, RESERVED or ALLOCATED_FROM_RESERVED. | +| allocationState | string | Final allocation state of the node (SKIPPED, ALLOCATED, RESERVED, ALLOCATED_FROM_RESERVED, etc) | +| diagnostic | string | Diagnostic about node in normal mode or nodes in aggregation mode | +| nodeId | string | The node id of the node which this activity belongs to | +| nodeIds | array of string(JSON)/string objects(XML) | The node id list of the nodes which this kind of activity belongs to in aggregation mode | +| count | int | The amount of nodes which this kind of activities belongs to in aggregation mode | + + +### Response Examples + +**JSON response** + +HTTP Request: + + Accept: application/json + GET http://rm-http-address:port/ws/v1/cluster/scheduler/app-activities + +Response Header: + + HTTP/1.1 200 OK + Content-Type: application/json + Transfer-Encoding: chunked + Server: Jetty(6.1.26) + +Response Body: + +```json +{ + "applicationId": "application_1557733202578_0001", + "allocations": [ + { + "queueName": "b", + "appPriority": "0", + "timestamp": "1557560677460", + "dateTime": "Sat May 11 15:44:37 CST 2019", + "allocationState": "RESERVED", + "requestAllocation": { + "requestPriority": "1", + "allocationRequestId": "-1", + "allocationState": "RESERVED", + "allocationAttempt": [ + { + "name": "Container-Id-Not-Assigned", + "allocationState": "SKIPPED", + "diagnostic": "Node does not have sufficient resource for request, insufficient resources=[memory-mb]\nrequired=, available=", + "nodeId": "127.0.0.3:1234" + }, + { + "name": "Container-Id-Not-Assigned", + "allocationState": "SKIPPED", + "diagnostic": "Node does not have sufficient resource for request, insufficient resources=[memory-mb]\nrequired=, available=", + "nodeId": "127.0.0.4:1234" + }, + { + "name": "Container-Id-Not-Assigned", + "allocationState": "SKIPPED", + "diagnostic": "Node does not have sufficient resource for request, insufficient resources=[memory-mb]\nrequired=, available=", + "nodeId": "127.0.0.2:1234" + }, + { + "name": "Container-Id-Not-Assigned", + "allocationState": "RESERVED", + "nodeId": "127.0.0.1:1234" + } + ] + } + }, + { + "queueName": "b", + "appPriority": "0", + "timestamp": "1557560677450", + "dateTime": "Sat May 11 15:44:37 CST 2019", + "allocationState": "ACCEPTED", + "requestAllocation": { + "requestPriority": "0", + "allocationRequestId": "-1", + "allocationState": "ALLOCATED", + "allocationAttempt": { + "name": "Container-Id-Not-Assigned", + "allocationState": "ALLOCATED", + "nodeId": "127.0.0.2:1234" + } + } + } + ] +} +``` + +**XML response** + +HTTP Request: + + Accept: application/xml + GET http://rm-http-address:port/ws/v1/cluster/scheduler/app-activities + +Response Header: + + HTTP/1.1 200 OK + Content-Type: application/xml; charset=utf-8 + Transfer-Encoding: chunked + +Response Body: + +```xml + + + + application_1557733321516_0001 + + b + 0 + 1557560677460 + Sat May 11 15:44:37 CST 2019 + RESERVED + + 1 + -1 + RESERVED + + Container-Id-Not-Assigned + SKIPPED + Node does not have sufficient resource for request, insufficient resources=[memory-mb] required=<memory:4096, vCores:1>, available=<memory:2048, vCores:2> + 127.0.0.3:1234 + + + Container-Id-Not-Assigned + SKIPPED + Node does not have sufficient resource for request, insufficient resources=[memory-mb] required=<memory:4096, vCores:1>, available=<memory:2048, vCores:2> + 127.0.0.4:1234 + + + Container-Id-Not-Assigned + SKIPPED + Node does not have sufficient resource for request, insufficient resources=[memory-mb] required=<memory:4096, vCores:1>, available=<memory:2048, vCores:2> + 127.0.0.2:1234 + + + Container-Id-Not-Assigned + RESERVED + 127.0.0.1:1234 + + + + + b + 0 + 1557560677450 + Sat May 11 15:44:37 CST 2019 + ACCEPTED + + 0 + -1 + ALLOCATED + + Container-Id-Not-Assigned + ALLOCATED + 127.0.0.2:1234 + + + + +``` \ No newline at end of file