Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-24380

Ambari HBase Rolling Restart failed to check RegionServers restarted successfully, continued to take down rest of RegionServers!

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.5.2
    • None
    • ambari-server
    • None

    Description

      Ambari rolling-restart of HBase RegionServers failed to detect that RegionServers were not coming back online, continued to take down the rest of the RegionServers in the cluster.

      Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh template near the start of the options:

      -XX:G1NewSizePercent=3

      before the following option (which was set a couple options further along, it needs to go after this option):

      -XX:+UnlockExperimentalVMOptions

      This resulted in both HMaster and RegionServer startup failures, but Ambari did not detect that the RegionServers were not coming back online, and proceeded to take down the rest of the RegionServers.

      Ambari should have checked that the first RegionServer restarted successfully and stayed up for the default 120 second rolling window via API checks on the RegionServer and that it is properly re-registered with active HMaster before moving on to the second RegionServer.

      Also, Ambari should refuse to continue with any rolling restart if no HMasters are online, see linked ticket AMBARI-24699.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              harisekhon Hari Sekhon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: