Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-19617

Restarting Some Components During a Suspended Upgrade Fails Due To Missing Upgrade Parameters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.5.0
    • 2.5.0
    • ambari-server
    • None

    Description

      While attempting to restart a component that has complicated upgrade logic, the upgrade parameters are not sent to the agents. This can cause some components to fails during a suspended upgrade restart.

      Example:

      • Begin express upgrade from 2.3.6.0-3796 to 2.5.3.0-37
      • HIVE_METASTORE couldn't start b/c of a missing Kerberos property:
        resource_management.core.exceptions.Fail: Configuration parameter 'hive.server2.authentication.kerberos.principal' was not found in configurations dictionary!
        
      • Chose to Ignore and Proceed which means that none of the Metastore SQL files ran.
      • Paused the upgrade (presumably at Finalize) and try to start Metastore. It fails to start because the new HDP 2.5 bits are using a non-upgraded database. That causes the -info option to fail and makes Ambari think it needs to run -initSchema.

      RCA: Metastore failed to start during upgrade and the admin chose to skip it. This caused schema upgrade logic not to run. Ambari can examine the upgrade_suspended property to determine if we need to run upgrade commands while restarting Metastore during an upgrade.

      However, it might be more prudent to simply send along the suspended upgrade properties so that any actions which might need to happen (such as invoking an upgrade script during the restart) can happen when the upgrade is suspended.

      Attachments

        1. AMBARI-19617.patch
          189 kB
          Jonathan Hurley

        Issue Links

          Activity

            People

              jonathanhurley Jonathan Hurley
              jonathanhurley Jonathan Hurley
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: