HBase
  1. HBase
  2. HBASE-5100

Rollback of split could cause closed region to be opened again

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0, 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If master sending close region to rs and region's split transaction concurrently happen,
      it may cause closed region to opened.

      See the detailed code in SplitTransaction#createDaughters

      List<StoreFile> hstoreFilesToSplit = null;
          try{
            hstoreFilesToSplit = this.parent.close(false);
            if (hstoreFilesToSplit == null) {
              // The region was closed by a concurrent thread.  We can't continue
              // with the split, instead we must just abandon the split.  If we
              // reopen or split this could cause problems because the region has
              // probably already been moved to a different server, or is in the
              // process of moving to a different server.
              throw new IOException("Failed to close region: already closed by " +
                "another thread");
            }
          } finally {
            this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
          }
      

      when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize();

      Although this region is not onlined in the regionserver, it may bring some potential problem.

      For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again.

      The parent region is f892dd6107b6b4130199582abc78e9c1

      master log

      2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780
      2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining)
      2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
      2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1
      2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1
      2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1
      2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347
      2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
      2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:60000, region=f892dd6107b6b4130199582abc78e9c1
      2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780, load=(requests=0, regions=1, usedHeap=0, maxHeap=0)
      2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.; plan=hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780
      2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. to dw80.kgb.sqa.cm4,60020,1324827865780
      2011-12-26 00:24:46,899 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. on dw80.kgb.sqa.cm4,60020,1324827865780
      

      RE_dw87 log

      2011-12-26 00:24:42,694 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x43447f30cb31367 Creating unassigned node for f892dd6107b6b4130199582abc78e9c1 in a CLOSING state
      2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.: disabling compactions & flushes
      2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: waiting for compaction to complete for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:43,340 INFO org.apache.hadoop.hbase.regionserver.HRegion: aborted compaction on region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. after 49sec
      2011-12-26 00:24:43,340 INFO org.apache.hadoop.hbase.regionserver.HRegion: Running close preflush of writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:43,340 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., current region memstore size 59.5m
      2011-12-26 00:24:43,340 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:45,347 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:45,347 WARN org.apache.hadoop.hbase.regionserver.HRegion: Region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. already closed
      2011-12-26 00:24:45,347 WARN org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback of failed split of writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.; Failed to close region: already closed by another thread
      2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x43447f30cb31367 Successfully transitioned node f892dd6107b6b4130199582abc78e9c1 from RS_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
      2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:46,837 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.; next sequenceid=717341809
      2011-12-26 00:24:46,841 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Successful rollback of failed split of writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:24:46,841 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:25:23,288 INFO org.apache.hadoop.hbase.regionserver.HRegion: completed compaction on region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. after 36sec
      2011-12-26 00:25:23,288 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
      2011-12-26 00:25:24,847 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. in META
      2011-12-26 00:25:26,165 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, META updated, and report to master. Parent=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., new regions: writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324830323288.595f458507cecf208640cb4a1be8e293., writetest,DHZX0CD7A4OE1KRILMWEJL2HTBN6OJVSFOKGU0P938DJ1M44B79C068NCZPXK1Z5OD2RQJ6LMA41TC0D44H05525TO3AHLZ4BZXX,1324830323288.ba9376c83327c34c7926fccb68c3b9e3.. Split took 2sec
      
      
      1. 5100-double-exeception.txt
        2 kB
        Ted Yu
      2. 5100-v2.txt
        1 kB
        Ted Yu
      3. hbase-5100.patch
        1 kB
        chunhui shen

        Activity

        Hide
        ramkrishna.s.vasudevan added a comment -

        Resolved as committed.

        Show
        ramkrishna.s.vasudevan added a comment - Resolved as committed.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #56 (See https://builds.apache.org/job/HBase-TRUNK-security/56/)
        HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui)

        tedyu :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #56 (See https://builds.apache.org/job/HBase-TRUNK-security/56/ ) HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2595 (See https://builds.apache.org/job/HBase-TRUNK/2595/)
        HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui)

        tedyu :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2595 (See https://builds.apache.org/job/HBase-TRUNK/2595/ ) HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #55 (See https://builds.apache.org/job/HBase-0.92-security/55/)
        HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui)

        tedyu :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #55 (See https://builds.apache.org/job/HBase-0.92-security/55/ ) HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui) tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #220 (See https://builds.apache.org/job/HBase-0.92/220/)
        HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui)

        tedyu :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #220 (See https://builds.apache.org/job/HBase-0.92/220/ ) HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui) tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        Hide
        Ted Yu added a comment -

        Thanks for the feedback, Chunhui.

        I integrated double exception patch to 0.92 and TRUNK.

        Thanks for initial patch, Chunhui.

        Thanks for the review, Stack.

        Show
        Ted Yu added a comment - Thanks for the feedback, Chunhui. I integrated double exception patch to 0.92 and TRUNK. Thanks for initial patch, Chunhui. Thanks for the review, Stack.
        Hide
        chunhui shen added a comment -

        @Zhihong
        I think both are ok now.
        I agree to commit 5100-double-exeception.txt since it is more understand understandable.

        Show
        chunhui shen added a comment - @Zhihong I think both are ok now. I agree to commit 5100-double-exeception.txt since it is more understand understandable.
        Hide
        Ted Yu added a comment - - edited

        I was trying to come up with easier to understand code.

        Chunhui's patch may be the one to use. 5100-v2.txt is the candidate with closedJE renamed.

        Show
        Ted Yu added a comment - - edited I was trying to come up with easier to understand code. Chunhui's patch may be the one to use. 5100-v2.txt is the candidate with closedJE renamed.
        Hide
        stack added a comment -

        Whats happening now in this issue? There is a v2. Is that now the candidate fix?

        Show
        stack added a comment - Whats happening now in this issue? There is a v2. Is that now the candidate fix?
        Hide
        Ted Yu added a comment -

        See discussion 'detecting presence of exception inside finally block' on seajug@yahoogroups.com where I polled Java developers on my proposed formation.

        Show
        Ted Yu added a comment - See discussion 'detecting presence of exception inside finally block' on seajug@yahoogroups.com where I polled Java developers on my proposed formation.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12508926/5100-double-exeception.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -151 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/641//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/641//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/641//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508926/5100-double-exeception.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/641//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/641//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/641//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Patch that covers runtime exception coming out of parent.close(false)

        Show
        Ted Yu added a comment - Patch that covers runtime exception coming out of parent.close(false)
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12508916/5100-v2.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -151 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/639//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/639//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/639//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508916/5100-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/639//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/639//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/639//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Chunhui's patch with new boolean renamed

        Show
        Ted Yu added a comment - Chunhui's patch with new boolean renamed
        Hide
        Ted Yu added a comment -

        IOException should have been Exception that we catch.

        Show
        Ted Yu added a comment - IOException should have been Exception that we catch.
        Hide
        stack added a comment -

        Reviewing the other, older patch (Ted just wrote me to review it instead), it looks cleaner (hbase-5100.patch). I don't know what closedJE is. I'm +1 on commit but suggest we use a better name – perhaps closedParent – on commit or the above suggested long name (I don't think it too long)

        Show
        stack added a comment - Reviewing the other, older patch (Ted just wrote me to review it instead), it looks cleaner (hbase-5100.patch). I don't know what closedJE is. I'm +1 on commit but suggest we use a better name – perhaps closedParent – on commit or the above suggested long name (I don't think it too long)
        Hide
        chunhui shen added a comment -

        @Zhihong
        NPE doesn't belong to IOException, so it will not be catched, and will not add the JournalEntry.CLOSED_PARENT_REGION.

        Show
        chunhui shen added a comment - @Zhihong NPE doesn't belong to IOException, so it will not be catched, and will not add the JournalEntry.CLOSED_PARENT_REGION.
        Hide
        stack added a comment -

        This is a good catch Chunhui; nice one.

        Patch looks fine but seems a bit involved. I tried my hand at it and ended up w/ the below which is not much better so +1 on commit:

        Index: src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
        ===================================================================
        --- src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java    (revision 1221978)
        +++ src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java    (working copy)
        @@ -246,18 +246,24 @@
             this.journal.add(JournalEntry.CREATE_SPLIT_DIR);
          
             List<StoreFile> hstoreFilesToSplit = null;
        -    try{
        +    boolean someoneElseClosedRegion = false;
        +    try {
               hstoreFilesToSplit = this.parent.close(false);
               if (hstoreFilesToSplit == null) {
                 // The region was closed by a concurrent thread.  We can't continue
        -        // with the split, instead we must just abandon the split.  If we
        -        // reopen or split this could cause problems because the region has
        -        // probably already been moved to a different server, or is in the
        -        // process of moving to a different server.
        +        // with the split. Instead we must abandon the split.  If we reopen
        +        // or split this could cause problems because the region has probably
        +        // already been moved to a different server, or is in the process of
        +        // moving to a different server.  Set boolean.  See finally clause
        +        // for how its handled.
        +        someoneElseClosedRegion = true;
        +      }
        +    } finally {
        +      if (someoneElseClosedRegion) {
        +        // If closed by someone else, don't add journal entry.
                 throw new IOException("Failed to close region: already closed by " +
        -          "another thread");
        +          another thread");
               }
        -    } finally {
               this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
             }
        
        Show
        stack added a comment - This is a good catch Chunhui; nice one. Patch looks fine but seems a bit involved. I tried my hand at it and ended up w/ the below which is not much better so +1 on commit: Index: src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java =================================================================== --- src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java (revision 1221978) +++ src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java (working copy) @@ -246,18 +246,24 @@ this .journal.add(JournalEntry.CREATE_SPLIT_DIR); List<StoreFile> hstoreFilesToSplit = null ; - try { + boolean someoneElseClosedRegion = false ; + try { hstoreFilesToSplit = this .parent.close( false ); if (hstoreFilesToSplit == null ) { // The region was closed by a concurrent thread. We can't continue - // with the split, instead we must just abandon the split. If we - // reopen or split this could cause problems because the region has - // probably already been moved to a different server, or is in the - // process of moving to a different server. + // with the split. Instead we must abandon the split. If we reopen + // or split this could cause problems because the region has probably + // already been moved to a different server, or is in the process of + // moving to a different server. Set boolean . See finally clause + // for how its handled. + someoneElseClosedRegion = true ; + } + } finally { + if (someoneElseClosedRegion) { + // If closed by someone else , don't add journal entry. throw new IOException( "Failed to close region: already closed by " + - "another thread" ); + another thread"); } - } finally { this .journal.add(JournalEntry.CLOSED_PARENT_REGION); }
        Hide
        Ted Yu added a comment -
        Show
        Ted Yu added a comment - There was no hung test in https://builds.apache.org/job/PreCommit-HBASE-Build/631/console
        Hide
        Ted Yu added a comment -
        Show
        Ted Yu added a comment - According to http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/NullPointerException.html , NPE is an Exception
        Hide
        chunhui shen added a comment -

        @Zhihong:
        I like the modify of 5100.txt,
        However,is it any possible to throw other exception in this.parent.close(false) ,such as NullPointerException? Maybe it is impossible and needn't worry.

        Show
        chunhui shen added a comment - @Zhihong: I like the modify of 5100.txt, However,is it any possible to throw other exception in this.parent.close(false) ,such as NullPointerException? Maybe it is impossible and needn't worry.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12508876/5100.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -151 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/631//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/631//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/631//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508876/5100.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/631//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/631//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/631//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        TestMasterObserver passed locally on my MacBook.

        Show
        Ted Yu added a comment - TestMasterObserver passed locally on my MacBook.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12508859/5100.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -151 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.coprocessor.TestMasterObserver
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/627//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/627//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/627//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508859/5100.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/627//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/627//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/627//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        I thought I like Java until I understood what createDaughters() should be doing - with Chunhui's patch, i.e.

        The case is that we need to handle two types of exceptions: the one coming out of parent.close(false) call and the IOE thrown from within the try block.

        I got some idea when I was driving this morning. 5100.txt is my proposed form where I tried to disambiguate the two types of exceptions by removing finally block.
        closedByOtherException is a singleton so the overhead of my proposal vs. introducing a boolean is negligible.

        I ran split-related tests and they passed:

          806  mt -Dtest=TestCoprocessorInterface
          807  mt -Dtest=TestSplitTransaction
          808  mt -Dtest=TestSplitTransactionOnCluster
          809  mt -Dtest=TestEndToEndSplitTransaction
          810  mt -Dtest=TestHRegion
        

        Please share your comments.

        Show
        Ted Yu added a comment - I thought I like Java until I understood what createDaughters() should be doing - with Chunhui's patch, i.e. The case is that we need to handle two types of exceptions: the one coming out of parent.close(false) call and the IOE thrown from within the try block. I got some idea when I was driving this morning. 5100.txt is my proposed form where I tried to disambiguate the two types of exceptions by removing finally block. closedByOtherException is a singleton so the overhead of my proposal vs. introducing a boolean is negligible. I ran split-related tests and they passed: 806 mt -Dtest=TestCoprocessorInterface 807 mt -Dtest=TestSplitTransaction 808 mt -Dtest=TestSplitTransactionOnCluster 809 mt -Dtest=TestEndToEndSplitTransaction 810 mt -Dtest=TestHRegion Please share your comments.
        Hide
        Jieshan Bean added a comment -

        The patch is good. If region has been closed by other thread, just abondon the split.That region should not be online again while rolling back. Meanwhile, we just need to clean the splitDir.

        Show
        Jieshan Bean added a comment - The patch is good. If region has been closed by other thread, just abondon the split.That region should not be online again while rolling back. Meanwhile, we just need to clean the splitDir.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Chunhui
        Good catch. What about addClosedJournalEntry?

        Other than that +1 on patch.

        Show
        ramkrishna.s.vasudevan added a comment - @Chunhui Good catch. What about addClosedJournalEntry? Other than that +1 on patch.
        Hide
        chunhui shen added a comment -

        @Zhihong
        En, it's better.

        Show
        chunhui shen added a comment - @Zhihong En, it's better.
        Hide
        Ted Yu added a comment -

        Current form of patch is fine.
        How about renaming closedJE as addJournalEntry ?

        Show
        Ted Yu added a comment - Current form of patch is fine. How about renaming closedJE as addJournalEntry ?
        Hide
        chunhui shen added a comment -

        @Zhihong
        If the initial boolean value is false, when encountering excepiton in this.parent.close(false), it will be false all the same which causes JournalEntry.CLOSED_PARENT_REGION not added in this.journal

        Show
        chunhui shen added a comment - @Zhihong If the initial boolean value is false, when encountering excepiton in this.parent.close(false), it will be false all the same which causes JournalEntry.CLOSED_PARENT_REGION not added in this.journal
        Hide
        Ted Yu added a comment -

        @Chunhui:
        My comment @ 28/Dec/11 06:35 renames the boolean whose initial value would be false.
        The renamed boolean carries negated value compared to that for closedJE.

        What do you think ?

        Show
        Ted Yu added a comment - @Chunhui: My comment @ 28/Dec/11 06:35 renames the boolean whose initial value would be false. The renamed boolean carries negated value compared to that for closedJE. What do you think ?
        Hide
        chunhui shen added a comment -

        @Zhihong
        If we remove the try/finally construct, when encountering excepiton in this.parent.close(false), the rollback of split would not do this.parent.initialize() because of no JournalEntry.CLOSED_PARENT_REGION.

        Show
        chunhui shen added a comment - @Zhihong If we remove the try/finally construct, when encountering excepiton in this.parent.close(false), the rollback of split would not do this.parent.initialize() because of no JournalEntry.CLOSED_PARENT_REGION.
        Hide
        Ted Yu added a comment -

        I think we can remove the try/finally construct and put this.journal.add(JournalEntry.CLOSED_PARENT_REGION); in else block of:

              if (hstoreFilesToSplit == null) {
        

        @Chunhui:
        What do you think ?

        Show
        Ted Yu added a comment - I think we can remove the try/finally construct and put this.journal.add(JournalEntry.CLOSED_PARENT_REGION); in else block of: if (hstoreFilesToSplit == null ) { @Chunhui: What do you think ?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12508728/hbase-5100.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -151 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 77 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.TestDrainingServer
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/607//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/607//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/607//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508728/hbase-5100.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 77 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/607//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/607//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/607//console This message is automatically generated.
        Hide
        chunhui shen added a comment -

        If returns not null, exceptionEncountered is also false.
        what about naming the boolean alreadyClosed, or closedBefore,or other similars...

        Show
        chunhui shen added a comment - If returns not null, exceptionEncountered is also false. what about naming the boolean alreadyClosed, or closedBefore,or other similars...
        Hide
        Ted Yu added a comment -

        I think closedJE means addClosedParentRegionJournalEntry.
        Since that name is too long, how about naming the boolean exceptionEncountered ?

        Show
        Ted Yu added a comment - I think closedJE means addClosedParentRegionJournalEntry. Since that name is too long, how about naming the boolean exceptionEncountered ?
        Hide
        chunhui shen added a comment -

        @Zhihong:
        Patch is submit

        Show
        chunhui shen added a comment - @Zhihong: Patch is submit
        Hide
        Ted Yu added a comment -

        @Chunhui:
        Do you have a patch ?

        Show
        Ted Yu added a comment - @Chunhui: Do you have a patch ?
        Hide
        chunhui shen added a comment - - edited

        When this.parent.close(false) returns null(It means region has already been closed), we needn't add the JournalEntry.CLOSED_PARENT_REGION

        Show
        chunhui shen added a comment - - edited When this.parent.close(false) returns null(It means region has already been closed), we needn't add the JournalEntry.CLOSED_PARENT_REGION

          People

          • Assignee:
            chunhui shen
            Reporter:
            chunhui shen
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development