Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-19617

Restarting Some Components During a Suspended Upgrade Fails Due To Missing Upgrade Parameters

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.5.0
    • 2.5.0
    • ambari-server
    • None

    Description

      While attempting to restart a component that has complicated upgrade logic, the upgrade parameters are not sent to the agents. This can cause some components to fails during a suspended upgrade restart.

      Example:

      • Begin express upgrade from 2.3.6.0-3796 to 2.5.3.0-37
      • HIVE_METASTORE couldn't start b/c of a missing Kerberos property:
        resource_management.core.exceptions.Fail: Configuration parameter 'hive.server2.authentication.kerberos.principal' was not found in configurations dictionary!
        
      • Chose to Ignore and Proceed which means that none of the Metastore SQL files ran.
      • Paused the upgrade (presumably at Finalize) and try to start Metastore. It fails to start because the new HDP 2.5 bits are using a non-upgraded database. That causes the -info option to fail and makes Ambari think it needs to run -initSchema.

      RCA: Metastore failed to start during upgrade and the admin chose to skip it. This caused schema upgrade logic not to run. Ambari can examine the upgrade_suspended property to determine if we need to run upgrade commands while restarting Metastore during an upgrade.

      However, it might be more prudent to simply send along the suspended upgrade properties so that any actions which might need to happen (such as invoking an upgrade script during the restart) can happen when the upgrade is suspended.

      Attachments

        1. AMBARI-19617.patch
          189 kB
          Jonathan Hurley

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jonathanhurley Jonathan Hurley
            jonathanhurley Jonathan Hurley
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment