Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-20736

Allow Potentially Long Running Restart Commands To Have Their Own Timeout



    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.5.1
    • ambari-server
    • None


      During an upgrade of a cluster, some commands are expected to take a very long time depending on what the size of the cluster is and how much data is stored. For example, a NameNode restart with SafeMode exit may take in excess of 30 minutes. On some clusters, this could take less than 1 minute.

      Currently today, the only way to adjust these properties is to do so across the board for all commands by editing ambari.properties and setting agent.task.timeout. This solution doesn't work very well since the majority of restarts during an upgrade are not on a master component.

      There needs to be a way to instruct Ambari that a restart should be allowed to run for a relatively long period of time.

      • Both Java and Python need to be considered here. We don't want Python to give up and return a FAILED state and we don't want Ambari server to set the task to TIMEDOUT.
      • This can be useful in both normal restarts and upgrade scenarios.

      Upgrade Only

      If considering this functionality in the context of an upgrade only, then it is conceivable that this logic can be placed inside of the upgrade XML packs:

          <service name="HDFS">
            <component name="NAMENODE">
                <task xsi:type="restart-task"  timeout="1800"/>
      • This would allow future mpacks to be able to control the restart of components. Perhaps this can even be slightly abstracted out:
          <service name="HDFS">
            <component name="NAMENODE">
                <task xsi:type="restart-task"  timeout="upgrade.parameter.master.restart.long"/>
      upgrade.parameter.slave.restart.short = 300
      upgrade.parameter.slave.restart.long = 900
      upgrade.parameter.master.restart.short = 1500
      upgrade.parameter.master.restart.long = 1800


        1. AMBARI-20736.patch
          32 kB
          Nate Cole

        Issue Links



              ncole@hortonworks.com Nate Cole
              ncole@hortonworks.com Nate Cole
              0 Vote for this issue
              2 Start watching this issue