HBase
  1. HBase
  2. HBASE-6389

Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.94.0, 0.95.2
    • Fix Version/s: 0.94.3, 0.95.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Reverts the cluster startup behavior to pre 0.94.0.

      With this, Master will wait until "hbase.master.wait.on.regionservers.mintostart" number of Region Servers have registered with it before it starts region assignment. The default value of this setting is 1.

      In large clusters with thousands of regions you may want to increase this to a higher number which is sufficient to handle the task of opening those region in parallel.

      If left to the default, at times, the Master could assign all regions to a single Region Server which will result in slow startup and in worst case could OOM the Region Server (some time resulting in META inconsistency).

      Here is how it works now (from the javadoc):

      We wait until one of these condition is met:
       - the master is stopped
       - the 'hbase.master.wait.on.regionservers.maxtostart' number of region servers is reached
       - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND there have been no new region server in for 'hbase.master.wait.on.regionservers.interval' time AND the 'hbase.master.wait.on.regionservers.timeout' is reached
      Show
      Reverts the cluster startup behavior to pre 0.94.0. With this, Master will wait until "hbase.master.wait.on.regionservers.mintostart" number of Region Servers have registered with it before it starts region assignment. The default value of this setting is 1. In large clusters with thousands of regions you may want to increase this to a higher number which is sufficient to handle the task of opening those region in parallel. If left to the default, at times, the Master could assign all regions to a single Region Server which will result in slow startup and in worst case could OOM the Region Server (some time resulting in META inconsistency). Here is how it works now (from the javadoc): We wait until one of these condition is met:  - the master is stopped  - the 'hbase.master.wait.on.regionservers.maxtostart' number of region servers is reached  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND there have been no new region server in for 'hbase.master.wait.on.regionservers.interval' time AND the 'hbase.master.wait.on.regionservers.timeout' is reached

      Description

      Continuing from HBASE-6375.

      It seems I was mistaken in my assumption that changing the value of "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s).

      While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993.

      From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not reached.

      Reading the current conditions of waitForRegionServers() clarifies it

      ServerManager.java (trunk rev:1360470)
      ....
      581	  /**
      582	   * Wait for the region servers to report in.
      583	   * We will wait until one of this condition is met:
      584	   *  - the master is stopped
      585	   *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
      586	   *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
      587	   *    region servers is reached
      588	   *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
      589	   *   there have been no new region server in for
      590	   *      'hbase.master.wait.on.regionservers.interval' time
      591	   *
      592	   * @throws InterruptedException
      593	   */
      594	  public void waitForRegionServers(MonitoredTask status)
      595	  throws InterruptedException {
      ....
      ....
      612	    while (
      613	      !this.master.isStopped() &&
      614	        slept < timeout &&
      615	        count < maxToStart &&
      616	        (lastCountChange+interval > now || count < minToStart)
      617	      ){
      ....
      

      So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone.

      As mentioned in HBASE-4993, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on.

      To enforce the required quorum as specified by "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, these conditions need to be modified as following

      ServerManager.java
      ..
        /**
         * Wait for the region servers to report in.
         * We will wait until one of this condition is met:
         *  - the master is stopped
         *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
         *    region servers is reached
         *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
         *   there have been no new region server in for
         *      'hbase.master.wait.on.regionservers.interval' time AND
         *   the 'hbase.master.wait.on.regionservers.timeout' is reached
         *
         * @throws InterruptedException
         */
        public void waitForRegionServers(MonitoredTask status)
      ..
      ..
          int minToStart = this.master.getConfiguration().
          getInt("hbase.master.wait.on.regionservers.mintostart", 1);
          int maxToStart = this.master.getConfiguration().
          getInt("hbase.master.wait.on.regionservers.maxtostart", Integer.MAX_VALUE);
          if (maxToStart < minToStart) {
            maxToStart = minToStart;
          }
      ..
      ..
          while (
            !this.master.isStopped() &&
              count < maxToStart &&
              (lastCountChange+interval > now || timeout > slept || count < minToStart)
            ){
      ..
      
      1. HBASE-6389_0.94.patch
        9 kB
        Aditya Kishore
      2. HBASE-6389_trunk_v2.patch
        11 kB
        Aditya Kishore
      3. HBASE-6389_trunk_v2.patch
        11 kB
        Aditya Kishore
      4. testReplication.jstack
        204 kB
        Ted Yu
      5. org.apache.hadoop.hbase.TestZooKeeper-output.txt
        120 kB
        Ted Yu
      6. HBASE-6389_trunk.patch
        5 kB
        Aditya Kishore
      7. HBASE-6389_trunk.patch
        5 kB
        Aditya Kishore
      8. HBASE-6389_trunk.patch
        3 kB
        Aditya Kishore

        Activity

        Lars Hofhansl made changes -
        Fix Version/s 0.94.3 [ 12323144 ]
        stack made changes -
        Fix Version/s 0.95.0 [ 12324094 ]
        Fix Version/s 0.96.0 [ 12320040 ]
        Fix Version/s 0.94.3 [ 12323144 ]
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Lars Hofhansl made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        stack made changes -
        Hadoop Flags Incompatible change,Reviewed [ 10342, 10343 ]
        Release Note Reverts the cluster startup behavior to pre 0.94.0.

        With this, Master will wait until "hbase.master.wait.on.regionservers.mintostart" number of Region Servers have registered with it before it starts region assignment. The default value of this setting is 1.

        In large clusters with thousands of regions you may want to increase this to a higher number which is sufficient to handle the task of opening those region in parallel.

        If left to the default, at times, the Master could assign all regions to a single Region Server which will result in slow startup and in worst case could OOM the Region Server (some time resulting in META inconsistency).
        Reverts the cluster startup behavior to pre 0.94.0.

        With this, Master will wait until "hbase.master.wait.on.regionservers.mintostart" number of Region Servers have registered with it before it starts region assignment. The default value of this setting is 1.

        In large clusters with thousands of regions you may want to increase this to a higher number which is sufficient to handle the task of opening those region in parallel.

        If left to the default, at times, the Master could assign all regions to a single Region Server which will result in slow startup and in worst case could OOM the Region Server (some time resulting in META inconsistency).

        Here is how it works now (from the javadoc):

        We wait until one of these condition is met:
         - the master is stopped
         - the 'hbase.master.wait.on.regionservers.maxtostart' number of region servers is reached
         - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND there have been no new region server in for 'hbase.master.wait.on.regionservers.interval' time AND the 'hbase.master.wait.on.regionservers.timeout' is reached
        Aditya Kishore made changes -
        Fix Version/s 0.94.3 [ 12323144 ]
        Aditya Kishore made changes -
        Attachment HBASE-6389_0.94.patch [ 12551509 ]
        Aditya Kishore made changes -
        Release Note Reverts the cluster startup behavior to pre 0.94.0.

        Now, Master will wait until "hbase.master.wait.on.regionservers.mintostart" number of Region Servers have registered with it before it starts region assignment. The default value of this setting is 1.

        In large clusters with thousands of regions you may want to increase this to a higher number which is sufficient to handle the task of opening those region in parallel.

        If left to the default, at times, the Master could assign all regions to a single Region Server which will result in slow start and in worst case could OOM the Region Server (some time resulting in META inconsistency).
        Reverts the cluster startup behavior to pre 0.94.0.

        With this, Master will wait until "hbase.master.wait.on.regionservers.mintostart" number of Region Servers have registered with it before it starts region assignment. The default value of this setting is 1.

        In large clusters with thousands of regions you may want to increase this to a higher number which is sufficient to handle the task of opening those region in parallel.

        If left to the default, at times, the Master could assign all regions to a single Region Server which will result in slow startup and in worst case could OOM the Region Server (some time resulting in META inconsistency).
        Aditya Kishore made changes -
        Release Note Reverts the cluster startup behavior to pre 0.94.0.

        Now, Master will wait until "hbase.master.wait.on.regionservers.mintostart" number of Region Servers have registered with it before it starts region assignment. The default value of this setting is 1.

        In large clusters with thousands of regions you may want to increase this to a higher number which is sufficient to handle the task of opening those region in parallel.

        If left to the default, at times, the Master could assign all regions to a single Region Server which will result in slow start and in worst case could OOM the Region Server (some time resulting in META inconsistency).
        Aditya Kishore made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aditya Kishore made changes -
        Attachment HBASE-6389_trunk_v2.patch [ 12549763 ]
        Aditya Kishore made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Aditya Kishore made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aditya Kishore made changes -
        Attachment HBASE-6389_trunk_v2.patch [ 12549692 ]
        Lars Hofhansl made changes -
        Fix Version/s 0.94.2 [ 12321884 ]
        Ted Yu made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Ted Yu made changes -
        Attachment testReplication.jstack [ 12537286 ]
        Ted Yu made changes -
        Aditya Kishore made changes -
        Release Note Resubmitting the patch for trunk with the fixed test.
        Aditya Kishore made changes -
        Status Reopened [ 4 ] Patch Available [ 10002 ]
        Hadoop Flags Reviewed [ 10343 ]
        Release Note Resubmitting the patch for trunk with the fixed test.
        Aditya Kishore made changes -
        Attachment HBASE-6389_trunk.patch [ 12537127 ]
        Lars Hofhansl made changes -
        Fix Version/s 0.94.2 [ 12321884 ]
        Fix Version/s 0.94.1 [ 12320257 ]
        Ted Yu made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Lars Hofhansl made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Aditya Kishore made changes -
        Attachment HBASE-6389_trunk.patch [ 12536453 ]
        Aditya Kishore made changes -
        Attachment HBASE-6389_trunk.patch [ 12536324 ]
        Aditya Kishore made changes -
        Attachment HBASE-6389_trunk.patch [ 12536317 ]
        Ted Yu made changes -
        Hadoop Flags Reviewed [ 10343 ]
        Lars Hofhansl made changes -
        Fix Version/s 0.94.1 [ 12320257 ]
        Aditya Kishore made changes -
        Summary Modify the conditions to ensure that Master waits for suffcient number of Region Servers before starting region assignments Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
        Aditya Kishore made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Fix Version/s 0.96.0 [ 12320040 ]
        Fix Version/s 0.94.1 [ 12320257 ]
        Lars Hofhansl made changes -
        Fix Version/s 0.94.1 [ 12320257 ]
        Aditya Kishore made changes -
        Field Original Value New Value
        Attachment HBASE-6389_trunk.patch [ 12536317 ]
        Aditya Kishore created issue -

          People

          • Assignee:
            Aditya Kishore
            Reporter:
            Aditya Kishore
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development