[AMBARI-15173] Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.2.2
Fix Version/s: 2.2.2
Component/s: ambari-server
Labels:
None

Description

Seen while performing an upgrade, it's possible that the status of a request/stage does not match that of its tasks. Essentially, the task could be HOLDING while the request is still IN_PROGRESS.

I believe that ~~AMBARI-15011~~ is responsible for this issue. ~~AMBARI-15011~~ introduced, among other things, a cache to the HostRoleCommandStatusSummaryDTO which is a aggregation of the number of tasks a stage has in each state (PENDING, HOLDING, etc).

This HostRoleCommandStatusSummaryDTO is used by CalculatedState to calculate a stage's and request's status based on the tasks.

The problem is that ServerActionExecutor is moving a tasks's state to HOLDING (reflected in the database correctly) but the cache invalidation happens inside the uncommitted transaction. This causes stale data to be re-cached. So, when we go to calculate the request and state status, we get IN_PROGRESS instead of HOLDING.

{
  "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*",
  "Stage": {
    "cluster_name": "cl1",
    "context": "Stop YARN Queues",
    "display_status": "IN_PROGRESS",
    "end_time": -1,
    "progress_percent": 35,
    "request_id": 61,
    "skippable": true,
    "stage_id": 1,
    "start_time": 1456227329191,
    "status": "IN_PROGRESS"
  },
  "tasks": [
    {
      "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754",
      "Tasks": {
        "attempt_cnt": 1,
        "cluster_name": "cl1",
        "command": "EXECUTE",
        "command_detail": "Before continuing, please stop all YARN queues. If yarn-site's yarn.resourcemanager.work-preserving-recovery.enabled is set to true, then you can skip this step since the clients will retry on their own.",
        "custom_command_name": "org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
        "end_time": -1,
        "error_log": "errors-754.txt",
        "exit_code": 0,
        "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
        "id": 754,
        "output_log": "output-754.txt",
        "request_id": 61,
        "role": "AMBARI_SERVER_ACTION",
        "stage_id": 1,
        "start_time": 1456227329191,
        "status": "HOLDING",
        "stderr": "",
        "stdout": "",
        "structured_out": {}
      }
    }
  ]
}

Attachments

Issue Links

is broken by

AMBARI-15011 Decrease the load on ambari database after cluster creation

Resolved

links to

Reviewboard

Activity

People

Assignee:: Jonathan Hurley

Reporter:: Jonathan Hurley

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Feb/16 23:00

Updated:: 01/Mar/16 21:15

Resolved:: 01/Mar/16 18:33