Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-12850

Downgrades That Are Retried With Unhealthy Hosts Can Produce Multiple Stages In Progress

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.1.0
    • 2.1.2
    • ambari-server
    • None

    Description

      When performing a downgrade from HDP 2.3 to HDP 2.2, the web client can sometimes not show the Retry button in the event of a failure. The problem stems from two issues:

      • When retrying a stage with hosts not heartbeating, the entire stage is automatically aborted
      • Task updates in stages is done in a non-atomic manner.
      {
        "Upgrade" : {
          "cluster_name" : "c1",
          "request_id" : 21
        },
        "upgrade_groups" : [
          {
            "UpgradeGroup" : {
              "completed_task_count" : 2,
              "group_id" : 22,
              "in_progress_task_count" : 0,
              "name" : "CORE_SLAVES",
              "progress_percent" : 100.0,
              "request_id" : 21,
              "status" : "COMPLETED",
              "title" : "Core Slaves",
              "total_task_count" : 2
            }
          },
          {
            "UpgradeGroup" : {
              "completed_task_count" : 5,
              "group_id" : 23,
              "in_progress_task_count" : 3,
              "name" : "CORE_MASTER",
              "progress_percent" : 81.42857142857143,
              "request_id" : 21,
              "status" : "HOLDING_FAILED",
              "title" : "Core Masters",
              "total_task_count" : 8
            }
          },
          {
            "UpgradeGroup" : {
              "completed_task_count" : 2,
              "group_id" : 24,
              "in_progress_task_count" : 1,
              "name" : "ZOOKEEPER",
              "progress_percent" : 66.66666666666666,
              "request_id" : 21,
              "status" : "IN_PROGRESS",
              "title" : "ZooKeeper",
              "total_task_count" : 3
            }
          },
          {
            "UpgradeGroup" : {
              "completed_task_count" : 5,
              "group_id" : 25,
              "in_progress_task_count" : 0,
              "name" : "POST_CLUSTER",
              "progress_percent" : 100.0,
              "request_id" : 21,
              "status" : "COMPLETED",
              "title" : "Finalize Downgrade",
              "total_task_count" : 5
            }
          }
        ]
      }
      

      Since we have upgrade group in IN_PROGRESS state, and no upgrade items in IN_PROGRESS state, following situation happened. After upgrade failed, all IN_PROGRESS groups, items should be transitioned into ABORTED state, so this is BE issue in order to avoid such a collision.

      Attachments

        1. AMBARI-12850.patch
          44 kB
          Jonathan Hurley

        Issue Links

          Activity

            People

              jonathanhurley Jonathan Hurley
              jonathanhurley Jonathan Hurley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: