HBase
  1. HBase
  2. HBASE-2461

Split doesn't handle IOExceptions when creating new region reference files

    Details

    • Hadoop Flags:
      Reviewed

      Description

      I was testing an HDFS patch which had a bug in it, so it happened to throw an NPE during a split with the following trace:

      2010-04-16 19:18:20,727 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction failed for region TestTable,-1945465867<1271449232310>,1271453785648
      java.lang.NullPointerException
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.enqueueCurrentPacket(DFSClient.java:3124)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3220)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3306)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3255)
      at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
      at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
      at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:560)
      at org.apache.hadoop.hbase.util.FSUtils.create(FSUtils.java:95)
      at org.apache.hadoop.hbase.io.Reference.write(Reference.java:129)
      at org.apache.hadoop.hbase.regionserver.StoreFile.split(StoreFile.java:498)
      at org.apache.hadoop.hbase.regionserver.HRegion.splitRegion(HRegion.java:682)
      at org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:162)
      at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:95)

      After that, my region was gone, any further writes to it would fail.

      1. 2461.txt
        8 kB
        stack
      2. 2461-v10.txt
        71 kB
        stack
      3. 2461-v2.txt
        22 kB
        stack
      4. 2461-v3.txt
        36 kB
        stack
      5. 2461-v4.txt
        43 kB
        stack
      6. 2461-v6.txt
        53 kB
        stack
      7. 2461-v7.txt
        70 kB
        stack
      8. 2461-v8.txt
        70 kB
        stack
      9. ugly_but_might_work.txt
        2 kB
        stack

        Issue Links

          Activity

          Hide
          stack added a comment -

          Committed.

          Show
          stack added a comment - Committed.
          Hide
          stack added a comment -

          Here's what I committed... includes j-d comments.

          Show
          stack added a comment - Here's what I committed... includes j-d comments.
          Hide
          HBase Review Board added a comment -

          Message from: "Jean-Daniel Cryans" <jdcryans@apache.org>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/474/#review643
          -----------------------------------------------------------

          Ship it!

          Few minor comments, else it looks very good

          src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
          <http://review.cloudera.org/r/474/#comment2414>

          I wonder how that will play with new master code

          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <http://review.cloudera.org/r/474/#comment2415>

          garbage

          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          <http://review.cloudera.org/r/474/#comment2416>

          red

          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          <http://review.cloudera.org/r/474/#comment2417>

          Here we could retry for a long time if the region server that holds meta/root died not long ago, blocking accesses to that region.

          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          <http://review.cloudera.org/r/474/#comment2418>

          incomplete doc?

          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          <http://review.cloudera.org/r/474/#comment2419>

          cleanup

          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          <http://review.cloudera.org/r/474/#comment2420>

          good idea

          • Jean-Daniel
          Show
          HBase Review Board added a comment - Message from: "Jean-Daniel Cryans" <jdcryans@apache.org> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/474/#review643 ----------------------------------------------------------- Ship it! Few minor comments, else it looks very good src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java < http://review.cloudera.org/r/474/#comment2414 > I wonder how that will play with new master code src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < http://review.cloudera.org/r/474/#comment2415 > garbage src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java < http://review.cloudera.org/r/474/#comment2416 > red src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java < http://review.cloudera.org/r/474/#comment2417 > Here we could retry for a long time if the region server that holds meta/root died not long ago, blocking accesses to that region. src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java < http://review.cloudera.org/r/474/#comment2418 > incomplete doc? src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java < http://review.cloudera.org/r/474/#comment2419 > cleanup src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java < http://review.cloudera.org/r/474/#comment2420 > good idea Jean-Daniel
          Hide
          stack added a comment -

          Includes minor fix where we weren't returning if failure doing rollback.

          I looked at adding test failing a split on a running cluster but the contortions are too extreme – CompactSplitThread is created in HRS#reinitialize() – and then our split is run inside in CST#split so we'd need to put in place first our own CST and then, CST has to be subclassable or amenable to injection. Currently it is not. So, I could mess w/ making what we load for CST configurable but then what about the other threads, LogRoller, the Worker thread, etc., why not make them configurable while I'm at it.

          ...but then I shouldn't even be doing this. There are containers that will stitch it all together for us and that can be easily changed at test time to run an alternative. See http://www.picocontainer.org/ or spring

          And the RS and Master are about to change w/ master rewrite.

          I'm going to pass on trying to test split failure on running cluster till at least after master rewrite goes in.

          Show
          stack added a comment - Includes minor fix where we weren't returning if failure doing rollback. I looked at adding test failing a split on a running cluster but the contortions are too extreme – CompactSplitThread is created in HRS#reinitialize() – and then our split is run inside in CST#split so we'd need to put in place first our own CST and then, CST has to be subclassable or amenable to injection. Currently it is not. So, I could mess w/ making what we load for CST configurable but then what about the other threads, LogRoller, the Worker thread, etc., why not make them configurable while I'm at it. ...but then I shouldn't even be doing this. There are containers that will stitch it all together for us and that can be easily changed at test time to run an alternative. See http://www.picocontainer.org/ or spring And the RS and Master are about to change w/ master rewrite. I'm going to pass on trying to test split failure on running cluster till at least after master rewrite goes in.
          Hide
          HBase Review Board added a comment -

          Message from: stack@duboce.net

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/474/
          -----------------------------------------------------------

          (Updated 2010-08-03 13:10:44.610359)

          Review request for hbase.

          Changes
          -------

          Found minor issue where wasnt' returning if failed rollback.

          Summary
          -------

          Posting this to review board.

          Patch that keeps a journal during split transaction. If split fails, call to rollback will restore the parent to original open condition by backing up whatever transaction steps completed.

          The transaction spans split checks, closing of parent region and creation of daughters up to the addition of parent offlining to .META. Once the .META. edit has been made, we cannot rollback – we have to go forward. This means that the basescanner fixup that will add missing daughter regions should the regionserver crash after parent region edit but before its added daughters is still required, in some form at least.

          This patch includes a test of the new split code but only run against an HRegion, not in server context. The split code is buried in heart of the regionserver and created on startup. I stared at it for a while and injecting fault was just forbidding. Its like bramble; there are so many spikes in the way of getting your finger down into the running split I ended up passing on it.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
          (split): Most of the split code has been moved out to the new SplitTransaction class.
          Now this method prepares the split transaction, executes, and if failure does rollback.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          (splitLock) Removed. Doesn't seem necessary. Just made close method synchronized.
          (SPLITDIR) Moved to new SplitTransaction
          Moved cleanup of half-done splits into SplitTransaction. It'll know better how to do this.
          Moved split code into SplitTransaction class.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          Made this class implement new OnlineRegions interface

          + A src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java
          New Interface that allows you add/remove regions from oline regions. This Interface
          adds little. Was just trying to make it so I didn't have to have server context doing
          tests but in the end I just passed null for the case of no server context. Could remove
          this.

          + A src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          New class that encapsulates all to do w/ splitting "transaction".

          + A src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java
          Minor utility class

          +M src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          (loadRegion) Added loading a region

          + M src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
          (testSpecificCompare) Unrelated change

          + M src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Change because of new manner in which splits are run. Added a splitRegions method.

          + A src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Test of region splitting code in region context. Testing in server context would take
          a bunch of work making it so could insert mock instance of SplitTransaction.

          This addresses bug HBASE-2461.
          http://issues.apache.org/jira/browse/HBASE-2461

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java 7589db3
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736
          src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9
          src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java 43fa6dd
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java PRE-CREATION

          Diff: http://review.cloudera.org/r/474/diff

          Testing
          -------

          Basic unit tests are passing.

          Thanks,

          stack

          Show
          HBase Review Board added a comment - Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/474/ ----------------------------------------------------------- (Updated 2010-08-03 13:10:44.610359) Review request for hbase. Changes ------- Found minor issue where wasnt' returning if failed rollback. Summary ------- Posting this to review board. Patch that keeps a journal during split transaction. If split fails, call to rollback will restore the parent to original open condition by backing up whatever transaction steps completed. The transaction spans split checks, closing of parent region and creation of daughters up to the addition of parent offlining to .META. Once the .META. edit has been made, we cannot rollback – we have to go forward. This means that the basescanner fixup that will add missing daughter regions should the regionserver crash after parent region edit but before its added daughters is still required, in some form at least. This patch includes a test of the new split code but only run against an HRegion, not in server context. The split code is buried in heart of the regionserver and created on startup. I stared at it for a while and injecting fault was just forbidding. Its like bramble; there are so many spikes in the way of getting your finger down into the running split I ended up passing on it. + M src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java (split): Most of the split code has been moved out to the new SplitTransaction class. Now this method prepares the split transaction, executes, and if failure does rollback. + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java (splitLock) Removed. Doesn't seem necessary. Just made close method synchronized. (SPLITDIR) Moved to new SplitTransaction Moved cleanup of half-done splits into SplitTransaction. It'll know better how to do this. Moved split code into SplitTransaction class. + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Made this class implement new OnlineRegions interface + A src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java New Interface that allows you add/remove regions from oline regions. This Interface adds little. Was just trying to make it so I didn't have to have server context doing tests but in the end I just passed null for the case of no server context. Could remove this. + A src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java New class that encapsulates all to do w/ splitting "transaction". + A src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java Minor utility class +M src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java (loadRegion) Added loading a region + M src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java (testSpecificCompare) Unrelated change + M src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Change because of new manner in which splits are run. Added a splitRegions method. + A src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java Test of region splitting code in region context. Testing in server context would take a bunch of work making it so could insert mock instance of SplitTransaction. This addresses bug HBASE-2461 . http://issues.apache.org/jira/browse/HBASE-2461 Diffs (updated) src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java 7589db3 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java 43fa6dd src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java PRE-CREATION Diff: http://review.cloudera.org/r/474/diff Testing ------- Basic unit tests are passing. Thanks, stack
          Hide
          stack added a comment -

          This might work for injecting a mocked SplitTransaction so I can test failed split in running cluster. Would need to provide alternate HRS implementation but unit test utility is already setup to allow this.

          Show
          stack added a comment - This might work for injecting a mocked SplitTransaction so I can test failed split in running cluster. Would need to provide alternate HRS implementation but unit test utility is already setup to allow this.
          Hide
          HBase Review Board added a comment -

          Message from: stack@duboce.net

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/474/
          -----------------------------------------------------------

          Review request for hbase.

          Summary
          -------

          Posting this to review board.

          Patch that keeps a journal during split transaction. If split fails, call to rollback will restore the parent to original open condition by backing up whatever transaction steps completed.

          The transaction spans split checks, closing of parent region and creation of daughters up to the addition of parent offlining to .META. Once the .META. edit has been made, we cannot rollback – we have to go forward. This means that the basescanner fixup that will add missing daughter regions should the regionserver crash after parent region edit but before its added daughters is still required, in some form at least.

          This patch includes a test of the new split code but only run against an HRegion, not in server context. The split code is buried in heart of the regionserver and created on startup. I stared at it for a while and injecting fault was just forbidding. Its like bramble; there are so many spikes in the way of getting your finger down into the running split I ended up passing on it.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
          (split): Most of the split code has been moved out to the new SplitTransaction class.
          Now this method prepares the split transaction, executes, and if failure does rollback.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          (splitLock) Removed. Doesn't seem necessary. Just made close method synchronized.
          (SPLITDIR) Moved to new SplitTransaction
          Moved cleanup of half-done splits into SplitTransaction. It'll know better how to do this.
          Moved split code into SplitTransaction class.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          Made this class implement new OnlineRegions interface

          + A src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java
          New Interface that allows you add/remove regions from oline regions. This Interface
          adds little. Was just trying to make it so I didn't have to have server context doing
          tests but in the end I just passed null for the case of no server context. Could remove
          this.

          + A src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          New class that encapsulates all to do w/ splitting "transaction".

          + A src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java
          Minor utility class

          +M src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          (loadRegion) Added loading a region

          + M src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
          (testSpecificCompare) Unrelated change

          + M src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Change because of new manner in which splits are run. Added a splitRegions method.

          + A src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Test of region splitting code in region context. Testing in server context would take
          a bunch of work making it so could insert mock instance of SplitTransaction.

          This addresses bug HBASE-2461.
          http://issues.apache.org/jira/browse/HBASE-2461

          Diffs


          src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java 7589db3
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736
          src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9
          src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java 43fa6dd
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java PRE-CREATION

          Diff: http://review.cloudera.org/r/474/diff

          Testing
          -------

          Basic unit tests are passing.

          Thanks,

          stack

          Show
          HBase Review Board added a comment - Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/474/ ----------------------------------------------------------- Review request for hbase. Summary ------- Posting this to review board. Patch that keeps a journal during split transaction. If split fails, call to rollback will restore the parent to original open condition by backing up whatever transaction steps completed. The transaction spans split checks, closing of parent region and creation of daughters up to the addition of parent offlining to .META. Once the .META. edit has been made, we cannot rollback – we have to go forward. This means that the basescanner fixup that will add missing daughter regions should the regionserver crash after parent region edit but before its added daughters is still required, in some form at least. This patch includes a test of the new split code but only run against an HRegion, not in server context. The split code is buried in heart of the regionserver and created on startup. I stared at it for a while and injecting fault was just forbidding. Its like bramble; there are so many spikes in the way of getting your finger down into the running split I ended up passing on it. + M src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java (split): Most of the split code has been moved out to the new SplitTransaction class. Now this method prepares the split transaction, executes, and if failure does rollback. + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java (splitLock) Removed. Doesn't seem necessary. Just made close method synchronized. (SPLITDIR) Moved to new SplitTransaction Moved cleanup of half-done splits into SplitTransaction. It'll know better how to do this. Moved split code into SplitTransaction class. + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Made this class implement new OnlineRegions interface + A src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java New Interface that allows you add/remove regions from oline regions. This Interface adds little. Was just trying to make it so I didn't have to have server context doing tests but in the end I just passed null for the case of no server context. Could remove this. + A src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java New class that encapsulates all to do w/ splitting "transaction". + A src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java Minor utility class +M src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java (loadRegion) Added loading a region + M src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java (testSpecificCompare) Unrelated change + M src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Change because of new manner in which splits are run. Added a splitRegions method. + A src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java Test of region splitting code in region context. Testing in server context would take a bunch of work making it so could insert mock instance of SplitTransaction. This addresses bug HBASE-2461 . http://issues.apache.org/jira/browse/HBASE-2461 Diffs src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java 7589db3 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java 43fa6dd src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java PRE-CREATION Diff: http://review.cloudera.org/r/474/diff Testing ------- Basic unit tests are passing. Thanks, stack
          Hide
          stack added a comment -

          Posting this to review board.

          Patch that keeps a journal during split transaction. If split fails, call to rollback will restore the parent to original open condition by backing up whatever transaction steps completed.

          The transaction spans split checks, closing of parent region and creation of daughters up to the addition of parent offlining to .META. Once the .META. edit has been made, we cannot rollback – we have to go forward. This means that the basescanner fixup that will add missing daughter regions should the regionserver crash after parent region edit but before its added daughters is still required, in some form at least.

          This patch includes a test of the new split code but only run against an HRegion, not in server context. The split code is buried in heart of the regionserver and created on startup. I stared at it for a while and injecting fault was just forbidding. Its like bramble; there are so many spikes in the way of getting your finger down into the running split I ended up passing on it.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
          (split): Most of the split code has been moved out to the new SplitTransaction class.
          Now this method prepares the split transaction, executes, and if failure does rollback.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          (splitLock) Removed. Doesn't seem necessary. Just made close method synchronized.
          (SPLITDIR) Moved to new SplitTransaction
          Moved cleanup of half-done splits into SplitTransaction. It'll know better how to do this.
          Moved split code into SplitTransaction class.

          + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          Made this class implement new OnlineRegions interface

          + A src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java
          New Interface that allows you add/remove regions from oline regions. This Interface
          adds little. Was just trying to make it so I didn't have to have server context doing
          tests but in the end I just passed null for the case of no server context. Could remove
          this.

          + A src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          New class that encapsulates all to do w/ splitting "transaction".

          + A src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java
          Minor utility class

          +M src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          (loadRegion) Added loading a region

          + M src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
          (testSpecificCompare) Unrelated change

          + M src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Change because of new manner in which splits are run. Added a splitRegions method.

          + A src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Test of region splitting code in region context. Testing in server context would take
          a bunch of work making it so could insert mock instance of SplitTransaction.

          Show
          stack added a comment - Posting this to review board. Patch that keeps a journal during split transaction. If split fails, call to rollback will restore the parent to original open condition by backing up whatever transaction steps completed. The transaction spans split checks, closing of parent region and creation of daughters up to the addition of parent offlining to .META. Once the .META. edit has been made, we cannot rollback – we have to go forward. This means that the basescanner fixup that will add missing daughter regions should the regionserver crash after parent region edit but before its added daughters is still required, in some form at least. This patch includes a test of the new split code but only run against an HRegion, not in server context. The split code is buried in heart of the regionserver and created on startup. I stared at it for a while and injecting fault was just forbidding. Its like bramble; there are so many spikes in the way of getting your finger down into the running split I ended up passing on it. + M src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java (split): Most of the split code has been moved out to the new SplitTransaction class. Now this method prepares the split transaction, executes, and if failure does rollback. + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java (splitLock) Removed. Doesn't seem necessary. Just made close method synchronized. (SPLITDIR) Moved to new SplitTransaction Moved cleanup of half-done splits into SplitTransaction. It'll know better how to do this. Moved split code into SplitTransaction class. + M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Made this class implement new OnlineRegions interface + A src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java New Interface that allows you add/remove regions from oline regions. This Interface adds little. Was just trying to make it so I didn't have to have server context doing tests but in the end I just passed null for the case of no server context. Could remove this. + A src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java New class that encapsulates all to do w/ splitting "transaction". + A src/main/java/org/apache/hadoop/hbase/util/PairOfSameType.java Minor utility class +M src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java (loadRegion) Added loading a region + M src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java (testSpecificCompare) Unrelated change + M src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Change because of new manner in which splits are run. Added a splitRegions method. + A src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java Test of region splitting code in region context. Testing in server context would take a bunch of work making it so could insert mock instance of SplitTransaction.
          Hide
          stack added a comment -

          This seems to work. Needs more tests first though before I post it to review board.

          Show
          stack added a comment - This seems to work. Needs more tests first though before I post it to review board.
          Hide
          stack added a comment -

          Implementation done. Tests next.

          Show
          stack added a comment - Implementation done. Tests next.
          Hide
          stack added a comment -

          Taking a different approach.... not done yet.

          Show
          stack added a comment - Taking a different approach.... not done yet.
          Hide
          stack added a comment -

          v2 – still not done.. going to be fun testing this.

          Show
          stack added a comment - v2 – still not done.. going to be fun testing this.
          Hide
          stack added a comment -

          This issue highlights how exceptions post close of the region-to-be-split – a necessary action if the split is to come out clean – can poke a hole in an online table.

          This patch starts down a road of treating the split operation inside in the regionserver as a 'transaction'. There is a prepare step and an execute step. Should the execute fail – execute step has stuff like close of region, update of meta table with new split codes – then we'll call rollback. The rollback will try and fixup the failed split by doing things like reopening region if appropriate and fixing up meta if necessary.

          If the rollback fails, we'll kill the regionserver so that the processing of the server shutdown gets the effected regions back on line again.

          Patch is not ready yet.

          Show
          stack added a comment - This issue highlights how exceptions post close of the region-to-be-split – a necessary action if the split is to come out clean – can poke a hole in an online table. This patch starts down a road of treating the split operation inside in the regionserver as a 'transaction'. There is a prepare step and an execute step. Should the execute fail – execute step has stuff like close of region, update of meta table with new split codes – then we'll call rollback. The rollback will try and fixup the failed split by doing things like reopening region if appropriate and fixing up meta if necessary. If the rollback fails, we'll kill the regionserver so that the processing of the server shutdown gets the effected regions back on line again. Patch is not ready yet.
          Hide
          stack added a comment -

          Inside in split, at a particular point, we close the parent region. Thereafter, we start in splitting the parents files. If an exception, we'll come up out of the split method but the parent is not reopened. Parent should be set read-only or something and only closed after daughters are created and registered in META. Assigning myself.

          Show
          stack added a comment - Inside in split, at a particular point, we close the parent region. Thereafter, we start in splitting the parents files. If an exception, we'll come up out of the split method but the parent is not reopened. Parent should be set read-only or something and only closed after daughters are created and registered in META. Assigning myself.
          Hide
          stack added a comment -

          Bulk move of 0.20.5 issues into 0.21.0 after vote that we merge branch into TRUNK up on list.

          Show
          stack added a comment - Bulk move of 0.20.5 issues into 0.21.0 after vote that we merge branch into TRUNK up on list.
          Hide
          Andrew Purtell added a comment -

          I think this, and all related tightening up what we do when IOE from FS, should be a subtask of HBASE-1964. At the Hackathon we'd like to start chasing these down with a modified DFSClient that we can inject faults, either random with adjustable probabilities or always on particular code paths.

          Show
          Andrew Purtell added a comment - I think this, and all related tightening up what we do when IOE from FS, should be a subtask of HBASE-1964 . At the Hackathon we'd like to start chasing these down with a modified DFSClient that we can inject faults, either random with adjustable probabilities or always on particular code paths.
          Hide
          stack added a comment -

          Marking as a blocker and moving into 0.20.5 and 0.21.

          Show
          stack added a comment - Marking as a blocker and moving into 0.20.5 and 0.21.

            People

            • Assignee:
              stack
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development