Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.2.2
-
None
Description
Seen while performing an upgrade, it's possible that the status of a request/stage does not match that of its tasks. Essentially, the task could be HOLDING while the request is still IN_PROGRESS.
I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 introduced, among other things, a cache to the HostRoleCommandStatusSummaryDTO which is a aggregation of the number of tasks a stage has in each state (PENDING, HOLDING, etc).
This HostRoleCommandStatusSummaryDTO is used by CalculatedState to calculate a stage's and request's status based on the tasks.
The problem is that ServerActionExecutor is moving a tasks's state to HOLDING (reflected in the database correctly) but the cache invalidation happens inside the uncommitted transaction. This causes stale data to be re-cached. So, when we go to calculate the request and state status, we get IN_PROGRESS instead of HOLDING.
{ "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*", "Stage": { "cluster_name": "cl1", "context": "Stop YARN Queues", "display_status": "IN_PROGRESS", "end_time": -1, "progress_percent": 35, "request_id": 61, "skippable": true, "stage_id": 1, "start_time": 1456227329191, "status": "IN_PROGRESS" }, "tasks": [ { "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754", "Tasks": { "attempt_cnt": 1, "cluster_name": "cl1", "command": "EXECUTE", "command_detail": "Before continuing, please stop all YARN queues. If yarn-site's yarn.resourcemanager.work-preserving-recovery.enabled is set to true, then you can skip this step since the clients will retry on their own.", "custom_command_name": "org.apache.ambari.server.serveraction.upgrades.ManualStageAction", "end_time": -1, "error_log": "errors-754.txt", "exit_code": 0, "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal", "id": 754, "output_log": "output-754.txt", "request_id": 61, "role": "AMBARI_SERVER_ACTION", "stage_id": 1, "start_time": 1456227329191, "status": "HOLDING", "stderr": "", "stdout": "", "structured_out": {} } } ] }
Attachments
Issue Links
- is broken by
-
AMBARI-15011 Decrease the load on ambari database after cluster creation
- Resolved
- links to