Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-12850

Downgrades That Are Retried With Unhealthy Hosts Can Produce Multiple Stages In Progress

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.1.2
    • Component/s: ambari-server
    • Labels:
      None

      Description

      When performing a downgrade from HDP 2.3 to HDP 2.2, the web client can sometimes not show the Retry button in the event of a failure. The problem stems from two issues:

      • When retrying a stage with hosts not heartbeating, the entire stage is automatically aborted
      • Task updates in stages is done in a non-atomic manner.
      {
        "Upgrade" : {
          "cluster_name" : "c1",
          "request_id" : 21
        },
        "upgrade_groups" : [
          {
            "UpgradeGroup" : {
              "completed_task_count" : 2,
              "group_id" : 22,
              "in_progress_task_count" : 0,
              "name" : "CORE_SLAVES",
              "progress_percent" : 100.0,
              "request_id" : 21,
              "status" : "COMPLETED",
              "title" : "Core Slaves",
              "total_task_count" : 2
            }
          },
          {
            "UpgradeGroup" : {
              "completed_task_count" : 5,
              "group_id" : 23,
              "in_progress_task_count" : 3,
              "name" : "CORE_MASTER",
              "progress_percent" : 81.42857142857143,
              "request_id" : 21,
              "status" : "HOLDING_FAILED",
              "title" : "Core Masters",
              "total_task_count" : 8
            }
          },
          {
            "UpgradeGroup" : {
              "completed_task_count" : 2,
              "group_id" : 24,
              "in_progress_task_count" : 1,
              "name" : "ZOOKEEPER",
              "progress_percent" : 66.66666666666666,
              "request_id" : 21,
              "status" : "IN_PROGRESS",
              "title" : "ZooKeeper",
              "total_task_count" : 3
            }
          },
          {
            "UpgradeGroup" : {
              "completed_task_count" : 5,
              "group_id" : 25,
              "in_progress_task_count" : 0,
              "name" : "POST_CLUSTER",
              "progress_percent" : 100.0,
              "request_id" : 21,
              "status" : "COMPLETED",
              "title" : "Finalize Downgrade",
              "total_task_count" : 5
            }
          }
        ]
      }
      

      Since we have upgrade group in IN_PROGRESS state, and no upgrade items in IN_PROGRESS state, following situation happened. After upgrade failed, all IN_PROGRESS groups, items should be transitioned into ABORTED state, so this is BE issue in order to avoid such a collision.

      1. AMBARI-12850.patch
        44 kB
        Jonathan Hurley

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Ambari-branch-2.1 #408 (See https://builds.apache.org/job/Ambari-branch-2.1/408/)
          AMBARI-12850 - Downgrades That Are Retried With Unhealthy Hosts Can Produce Multiple Stages In Progress (jonathanhurley) (jhurley: http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=d81b0f97dc32087c028b7eed5cd8680ea24e3d23)

          • ambari-server/src/main/java/org/apache/ambari/server/api/services/UpgradeGroupService.java
          • ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeGroupResourceProvider.java
          • ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
          • ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
          • ambari-server/src/test/java/org/apache/ambari/server/controller/internal/StageResourceProviderTest.java
          • ambari-server/src/main/java/org/apache/ambari/server/controller/internal/StageResourceProvider.java
          • ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeItemResourceProvider.java
          • ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
          • ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Ambari-branch-2.1 #408 (See https://builds.apache.org/job/Ambari-branch-2.1/408/ ) AMBARI-12850 - Downgrades That Are Retried With Unhealthy Hosts Can Produce Multiple Stages In Progress (jonathanhurley) (jhurley: http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=d81b0f97dc32087c028b7eed5cd8680ea24e3d23 ) ambari-server/src/main/java/org/apache/ambari/server/api/services/UpgradeGroupService.java ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeGroupResourceProvider.java ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java ambari-server/src/test/java/org/apache/ambari/server/controller/internal/StageResourceProviderTest.java ambari-server/src/main/java/org/apache/ambari/server/controller/internal/StageResourceProvider.java ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeItemResourceProvider.java ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Ambari-trunk-Commit #3308 (See https://builds.apache.org/job/Ambari-trunk-Commit/3308/)
          AMBARI-12850 - Downgrades That Are Retried With Unhealthy Hosts Can Produce Multiple Stages In Progress (jonathanhurley) (jhurley: http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=4ab90629f37a1fe9f817f4a134584035018b879e)

          • ambari-server/src/main/java/org/apache/ambari/server/api/services/UpgradeGroupService.java
          • ambari-server/src/test/java/org/apache/ambari/server/controller/internal/StageResourceProviderTest.java
          • ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
          • ambari-server/src/main/java/org/apache/ambari/server/controller/internal/StageResourceProvider.java
          • ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
          • ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
          • ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeGroupResourceProvider.java
          • ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
          • ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeItemResourceProvider.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Ambari-trunk-Commit #3308 (See https://builds.apache.org/job/Ambari-trunk-Commit/3308/ ) AMBARI-12850 - Downgrades That Are Retried With Unhealthy Hosts Can Produce Multiple Stages In Progress (jonathanhurley) (jhurley: http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=4ab90629f37a1fe9f817f4a134584035018b879e ) ambari-server/src/main/java/org/apache/ambari/server/api/services/UpgradeGroupService.java ambari-server/src/test/java/org/apache/ambari/server/controller/internal/StageResourceProviderTest.java ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java ambari-server/src/main/java/org/apache/ambari/server/controller/internal/StageResourceProvider.java ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeGroupResourceProvider.java ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeItemResourceProvider.java

            People

            • Assignee:
              jonathan.hurley Jonathan Hurley
              Reporter:
              jonathan.hurley Jonathan Hurley
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development