HBase
  1. HBase
  2. HBASE-5008

The clusters can't provide services because Region can't flush.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.6, 0.92.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Hbase version 0.90.4 + patches

      My analysis is as follows:

      //Started splitting region b24d8ccb852ff742f2a27d01b7f5853e and closed region.

      2011-12-10 17:32:48,653 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.
      2011-12-10 17:32:49,759 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: disabling compactions & flushes
      2011-12-10 17:32:49,759 INFO org.apache.hadoop.hbase.regionserver.HRegion: Running close preflush of Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.

      //Processed a flush request and skipped , But flushRequested had set to true
      2011-12-10 17:33:06,963 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e., current region memstore size 12.6m
      2011-12-10 17:33:17,277 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Skipping flush on Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e. because closing

      //split region b24d8ccb852ff742f2a27d01b7f5853 failed and rolled back, flushRequested flag was true, So all handle was blocked

      2011-12-10 17:34:01,293 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Cleaned up old failed split transaction detritus: hdfs://193.195.18.121:9000/hbase/Htable_UFDR_004/b24d8ccb852ff742f2a27d01b7f5853e/splits
      2011-12-10 17:34:01,294 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.; next sequenceid=15494173
      2011-12-10 17:34:01,295 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Successful rollback of failed split of Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.
      2011-12-10 17:43:10,147 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 19 on 20020' on region
      Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size

      // All handles had been blocked. The clusters could not provide services

      2011-12-10 17:34:01,295 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Successful rollback of failed split of Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.
      2011-12-10 17:43:10,147 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 19 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:10,192 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 34 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:10,193 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 51 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:10,196 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 85 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:10,199 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 88 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:10,202 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 44 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:11,663 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:11,665 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 10 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:11,670 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 75 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:11,671 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 98 on 20020' on region Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore size 384.0m is >= than blocking 384.0m size
      2011-12-10 17:43:11,680 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 11 on 20020' on region

        Activity

        Hide
        gaojinchao added a comment -

        I made a patch, Please review

        Show
        gaojinchao added a comment - I made a patch, Please review
        Hide
        gaojinchao added a comment -

        TestSplitTransactionOnCluster and TestSplitTransaction have passed.
        All test cases are running and will give a result tomorrow.

        Show
        gaojinchao added a comment - TestSplitTransactionOnCluster and TestSplitTransaction have passed. All test cases are running and will give a result tomorrow.
        Hide
        Lars Hofhansl added a comment -

        This if I understand this correctly:

        1. a requested flush was canceled (because of a split?), we never unset flushRequested
        2. from this point on every new flush request is ignored because flushRequested is already true

        Change seems sensible, although I do not know this part of the code very well. Can flushRequested is never be legitimately true at this point?

        Show
        Lars Hofhansl added a comment - This if I understand this correctly: a requested flush was canceled (because of a split?), we never unset flushRequested from this point on every new flush request is ignored because flushRequested is already true Change seems sensible, although I do not know this part of the code very well. Can flushRequested is never be legitimately true at this point?
        Hide
        ramkrishna.s.vasudevan added a comment -

        +1 on patch.. Good catch and good analysis. May be we can add for trunk if the problem is found in trunk.

        Show
        ramkrishna.s.vasudevan added a comment - +1 on patch.. Good catch and good analysis. May be we can add for trunk if the problem is found in trunk.
        Hide
        stack added a comment -

        +1 on patch. +1 on applying to trunk and 0.92 too. Nice fix Jinchao.

        Show
        stack added a comment - +1 on patch. +1 on applying to trunk and 0.92 too. Nice fix Jinchao.
        Hide
        stack added a comment -

        Thanks for the patch Jinchao. Applied trunk and two branches.

        Show
        stack added a comment - Thanks for the patch Jinchao. Applied trunk and two branches.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #29 (See https://builds.apache.org/job/HBase-TRUNK-security/29/)
        HBASE-5008 The clusters can't provide services because Region can't flush.

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #29 (See https://builds.apache.org/job/HBase-TRUNK-security/29/ ) HBASE-5008 The clusters can't provide services because Region can't flush. stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2540 (See https://builds.apache.org/job/HBase-TRUNK/2540/)
        HBASE-5008 The clusters can't provide services because Region can't flush.

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2540 (See https://builds.apache.org/job/HBase-TRUNK/2540/ ) HBASE-5008 The clusters can't provide services because Region can't flush. stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #184 (See https://builds.apache.org/job/HBase-0.92/184/)
        HBASE-5008 The clusters can't provide services because Region can't flush.

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #184 (See https://builds.apache.org/job/HBase-0.92/184/ ) HBASE-5008 The clusters can't provide services because Region can't flush. stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #37 (See https://builds.apache.org/job/HBase-0.92-security/37/)
        HBASE-5008 The clusters can't provide services because Region can't flush.

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #37 (See https://builds.apache.org/job/HBase-0.92-security/37/ ) HBASE-5008 The clusters can't provide services because Region can't flush. stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java

          People

          • Assignee:
            gaojinchao
            Reporter:
            gaojinchao
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development