Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-10029

Node auto-recovery

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • ambari-agent, ambari-server
    • None

    Description

      Using blue-print, it is possible to perform a zero-touch install of hadoop clusters using Ambari. This is especially useful in the cloud environment. However, cloud environment also can be dynamic in the sense that nodes will get rebooted or reset to the original image.

      Reset, being that the node (usually VM) gets reverted to original state where it joined the cluster. It is assumed that a reset node has ambari-agent installed and configured to communicate with the server. The node may also have all packages pre-instaled.

      Node recovery is the feature to bring back a rebooted/reset online by starting or installing and then starting the host components that are already on the host.

      In general, temporarily losing a node and then performing node recovery on a slave host should not affect the whole cluster. If its is a master node then there can be some disruption based on what is deployed on the master host and if HA is enabled for the master services or not.

      Node recovery, discussed in this JIRA, only addresses the ability to automatically INSTALL/CONFIGURE/START host components on the node so that the desired state of the host component matches the actual state.

      Attachments

        1. AMBARI-10029.patch
          119 kB
          Sumit Mohanty
        2. AMBARI-10029.p-II.patch
          40 kB
          Sumit Mohanty
        3. NodeRecovery.pdf
          341 kB
          Sumit Mohanty

        Issue Links

          Activity

            People

              sumitmohanty Sumit Mohanty
              sumitmohanty Sumit Mohanty
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: