HBase
  1. HBase
  2. HBASE-5356

region_mover.rb can hang if table region it belongs to is deleted.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.90.3, 0.92.0, 0.94.0
    • Fix Version/s: 0.98.0, 0.96.2, 0.99.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation).

      Here's the start of the relevent stack dump

      12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table:
      org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\
      Verify_1328735001040,yC^P\xD7\x945\xD4,99999999999999
              at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136)
              at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\
      9)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\
      )
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416)
              at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
              at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\
      java:1018)
              at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104)
              at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027)
              at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535)
              at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525)
              at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380)
              at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58)
              at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137)
              at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133)
              at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\
      sfulScan:65535)
              at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\
      sfulScan:65535)
              at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:171)
              at usr.lib.hbase.bin.region_mover.block_4$RUBY$__for__(/usr/lib/hbase/bin/region_mover.rb:326)
              at usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__.call(usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__:65535)
              at org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:133)
              at org.jruby.runtime.BlockBody.call(BlockBody.java:73)
              at org.jruby.runtime.Block.call(Block.java:89)
              at org.jruby.RubyProc.call(RubyProc.java:268)
              at org.jruby.RubyProc.call(RubyProc.java:228)
              at org.jruby.RubyProc$i$0$0$call.call(RubyProc$i$0$0$call.gen:65535)
              at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:209)
              at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:205)
              at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137)
              at org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
              at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:103)
              at org.jruby.ast.WhileNode.interpret(WhileNode.java:131)
              at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:103)
              at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
              at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
              at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:169)
              at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:171)
              at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:272)
              at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:114)
              at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:123)
              at usr.lib.hbase.bin.region_mover.chained_26_rescue_4$RUBY$SYNTHETICunloadRegions(/usr/lib/hbase/bin/region_mover.rb:319)
              at usr.lib.hbase.bin.region_mover.method__25$RUBY$unloadRegions(/usr/lib/hbase/bin/region_mover.rb:313)
              at usr$lib$hbase$bin$region_mover#method__25$RUBY$unloadRegions.call(usr$lib$hbase$bin$region_mover#method__25$RUBY$unloadRegions:65535)
              at usr$lib$hbase$bin$region_mover#method__25$RUBY$unloadRegions.call(usr$lib$hbase$bin$region_mover#method__25$RUBY$unloadRegions:65535)
              at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:302)
              at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:173)
              at usr.lib.hbase.bin.region_mover.__file__(/usr/lib/hbase/bin/region_mover.rb:430)
              at usr.lib.hbase.bin.region_mover.load(/usr/lib/hbase/bin/region_mover.rb)
              at org.jruby.Ruby.runScript(Ruby.java:670)
              at org.jruby.Ruby.runNormally(Ruby.java:574)
              at org.jruby.Ruby.runFromMain(Ruby.java:423)
              at org.jruby.Main.doRunFromMain(Main.java:278)
              at org.jruby.Main.internalRun(Main.java:198)
              at org.jruby.Main.run(Main.java:164)
              at org.jruby.Main.run(Main.java:148)
              at org.jruby.Main.main(Main.java:128)
      

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          729d 2h 49m 1 Jimmy Xiang 07/Feb/14 01:02
          Patch Available Patch Available Resolved Resolved
          16h 57m 1 Jimmy Xiang 07/Feb/14 18:00
          Resolved Resolved Closed Closed
          379d 5h 33m 1 Enis Soztutar 21/Feb/15 23:33
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94-security #584 (See https://builds.apache.org/job/HBase-0.94-security/584/)
          HBASE-12921 Port HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94. (Liu Shaohui) (larsh: rev 4a1d46487cd83bb6f62b6367f688fbb44bbc6f82)

          • bin/region_mover.rb
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.94-security #584 (See https://builds.apache.org/job/HBase-0.94-security/584/ ) HBASE-12921 Port HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94. (Liu Shaohui) (larsh: rev 4a1d46487cd83bb6f62b6367f688fbb44bbc6f82) bin/region_mover.rb
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-JDK7 #239 (See https://builds.apache.org/job/HBase-0.94-JDK7/239/)
          HBASE-12921 Port HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94. (Liu Shaohui) (larsh: rev 4a1d46487cd83bb6f62b6367f688fbb44bbc6f82)

          • bin/region_mover.rb
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #239 (See https://builds.apache.org/job/HBase-0.94-JDK7/239/ ) HBASE-12921 Port HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94. (Liu Shaohui) (larsh: rev 4a1d46487cd83bb6f62b6367f688fbb44bbc6f82) bin/region_mover.rb
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.94 #1471 (See https://builds.apache.org/job/HBase-0.94/1471/)
          HBASE-12921 Port HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94. (Liu Shaohui) (larsh: rev 4a1d46487cd83bb6f62b6367f688fbb44bbc6f82)

          • bin/region_mover.rb
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1471 (See https://builds.apache.org/job/HBase-0.94/1471/ ) HBASE-12921 Port HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94. (Liu Shaohui) (larsh: rev 4a1d46487cd83bb6f62b6367f688fbb44bbc6f82) bin/region_mover.rb
          Enis Soztutar made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Esteban Gutierrez made changes -
          Link This issue is duplicated by HBASE-8490 [ HBASE-8490 ]
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in hbase-0.96-hadoop2 #197 (See https://builds.apache.org/job/hbase-0.96-hadoop2/197/)
          HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565744)

          • /hbase/branches/0.96/bin/region_mover.rb
          Show
          Hudson added a comment - SUCCESS: Integrated in hbase-0.96-hadoop2 #197 (See https://builds.apache.org/job/hbase-0.96-hadoop2/197/ ) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565744) /hbase/branches/0.96/bin/region_mover.rb
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #83 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/83/)
          HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565742)

          • /hbase/trunk/bin/region_mover.rb
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #83 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/83/ ) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565742) /hbase/trunk/bin/region_mover.rb
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #129 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/129/)
          HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565743)

          • /hbase/branches/0.98/bin/region_mover.rb
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #129 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/129/ ) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565743) /hbase/branches/0.98/bin/region_mover.rb
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4899 (See https://builds.apache.org/job/HBase-TRUNK/4899/)
          HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565742)

          • /hbase/trunk/bin/region_mover.rb
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4899 (See https://builds.apache.org/job/HBase-TRUNK/4899/ ) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565742) /hbase/trunk/bin/region_mover.rb
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in hbase-0.96 #285 (See https://builds.apache.org/job/hbase-0.96/285/)
          HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565744)

          • /hbase/branches/0.96/bin/region_mover.rb
          Show
          Hudson added a comment - SUCCESS: Integrated in hbase-0.96 #285 (See https://builds.apache.org/job/hbase-0.96/285/ ) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565744) /hbase/branches/0.96/bin/region_mover.rb
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98 #139 (See https://builds.apache.org/job/HBase-0.98/139/)
          HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565743)

          • /hbase/branches/0.98/bin/region_mover.rb
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #139 (See https://builds.apache.org/job/HBase-0.98/139/ ) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565743) /hbase/branches/0.98/bin/region_mover.rb
          Hide
          Jimmy Xiang added a comment -

          Integrated into trunk, 0.98, and 0.96. Thanks.

          Show
          Jimmy Xiang added a comment - Integrated into trunk, 0.98, and 0.96. Thanks.
          Jimmy Xiang made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 0.98.0 [ 12323143 ]
          Fix Version/s 0.96.2 [ 12325658 ]
          Fix Version/s 0.99.0 [ 12325675 ]
          Resolution Fixed [ 1 ]
          Hide
          Jimmy Xiang added a comment -

          Good question. I put a break point before moving a region, then I tried to move a region after the table was either disabled or deleted. The script works well for these scenarios now.

          Show
          Jimmy Xiang added a comment - Good question. I put a break point before moving a region, then I tried to move a region after the table was either disabled or deleted. The script works well for these scenarios now.
          Hide
          stack added a comment -

          lgtm Jimmy. How'd you test it?

          Show
          stack added a comment - lgtm Jimmy. How'd you test it?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12627521/hbase-5356.patch
          against trunk revision .
          ATTACHMENT ID: 12627521

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627521/hbase-5356.patch against trunk revision . ATTACHMENT ID: 12627521 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8621//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          Posted a patch that handled cases when a table is disabled/deleted in the middle.

          Show
          Jimmy Xiang added a comment - Posted a patch that handled cases when a table is disabled/deleted in the middle.
          Jimmy Xiang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Jimmy Xiang made changes -
          Attachment hbase-5356.patch [ 12627521 ]
          Jimmy Xiang made changes -
          Assignee Jonathan Hsieh [ jmhsieh ] Jimmy Xiang [ jxiang ]
          Hide
          Jonathan Hsieh added a comment -

          rephrase of previous comment: The region_mover script initially caches a list of regions to move. If the table is deleted after the cached list is gathered but before all regions are moved, the script can get stuck attempting to move a deleted region.

          In a related but likely separate issue – if a new presplit table is created, as the region_mover is emptying an RS, the emptying RS is a candidate for new regions and will get some of the new regions. Ideally this is fenced off so that it does not get regions assigned to it, but this would require some ZK. This seems less important because the majority of regions will be moved off the RS and the few new regions can rely on the automatically fail over to other RS's.

          Show
          Jonathan Hsieh added a comment - rephrase of previous comment: The region_mover script initially caches a list of regions to move. If the table is deleted after the cached list is gathered but before all regions are moved, the script can get stuck attempting to move a deleted region. In a related but likely separate issue – if a new presplit table is created, as the region_mover is emptying an RS, the emptying RS is a candidate for new regions and will get some of the new regions. Ideally this is fenced off so that it does not get regions assigned to it, but this would require some ZK. This seems less important because the majority of regions will be moved off the RS and the few new regions can rely on the automatically fail over to other RS's.
          Hide
          Jonathan Hsieh added a comment -

          Related issue – if you create a new table and region mover had an old list, new regions get assigned to the region we are trying to gracefully remove regions from.

          Show
          Jonathan Hsieh added a comment - Related issue – if you create a new table and region mover had an old list, new regions get assigned to the region we are trying to gracefully remove regions from.
          Jonathan Hsieh made changes -
          Assignee Jonathan Hsieh [ jmhsieh ]
          Jonathan Hsieh made changes -
          Field Original Value New Value
          Description I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation).

          Here's the start of the relevent stack dump
          {code}
          12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table:
          org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\
          Verify_1328735001040,yC^P\xD7\x945\xD4,99999999999999
                  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136)
                  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\
          9)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\
          )
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416)
                  at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
                  at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\
          java:1018)
                  at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104)
                  at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027)
                  at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535)
                  at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525)
                  at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380)
                  at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58)
                  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137)
                  at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133)
                  at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\
          sfulScan:65535)
                  at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\
          sfulScan:65535)

          {code}
          I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation).

          Here's the start of the relevent stack dump
          {code}
          12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table:
          org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\
          Verify_1328735001040,yC^P\xD7\x945\xD4,99999999999999
                  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136)
                  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\
          9)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\
          )
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416)
                  at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
                  at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63)
                  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\
          java:1018)
                  at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104)
                  at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027)
                  at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535)
                  at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525)
                  at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380)
                  at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58)
                  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137)
                  at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133)
                  at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\
          sfulScan:65535)
                  at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\
          sfulScan:65535)
                  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:171)
                  at usr.lib.hbase.bin.region_mover.block_4$RUBY$__for__(/usr/lib/hbase/bin/region_mover.rb:326)
                  at usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__.call(usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__:65535)
                  at org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:133)
                  at org.jruby.runtime.BlockBody.call(BlockBody.java:73)
                  at org.jruby.runtime.Block.call(Block.java:89)
                  at org.jruby.RubyProc.call(RubyProc.java:268)
                  at org.jruby.RubyProc.call(RubyProc.java:228)
                  at org.jruby.RubyProc$i$0$0$call.call(RubyProc$i$0$0$call.gen:65535)
                  at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:209)
                  at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:205)
                  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137)
                  at org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
                  at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:103)
                  at org.jruby.ast.WhileNode.interpret(WhileNode.java:131)
                  at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:103)
                  at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
                  at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
                  at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:169)
                  at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:171)
                  at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:272)
                  at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:114)
                  at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:123)
                  at usr.lib.hbase.bin.region_mover.chained_26_rescue_4$RUBY$SYNTHETICunloadRegions(/usr/lib/hbase/bin/region_mover.rb:319)
                  at usr.lib.hbase.bin.region_mover.method__25$RUBY$unloadRegions(/usr/lib/hbase/bin/region_mover.rb:313)
                  at usr$lib$hbase$bin$region_mover#method__25$RUBY$unloadRegions.call(usr$lib$hbase$bin$region_mover#method__25$RUBY$unloadRegions:65535)
                  at usr$lib$hbase$bin$region_mover#method__25$RUBY$unloadRegions.call(usr$lib$hbase$bin$region_mover#method__25$RUBY$unloadRegions:65535)
                  at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:302)
                  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:173)
                  at usr.lib.hbase.bin.region_mover.__file__(/usr/lib/hbase/bin/region_mover.rb:430)
                  at usr.lib.hbase.bin.region_mover.load(/usr/lib/hbase/bin/region_mover.rb)
                  at org.jruby.Ruby.runScript(Ruby.java:670)
                  at org.jruby.Ruby.runNormally(Ruby.java:574)
                  at org.jruby.Ruby.runFromMain(Ruby.java:423)
                  at org.jruby.Main.doRunFromMain(Main.java:278)
                  at org.jruby.Main.internalRun(Main.java:198)
                  at org.jruby.Main.run(Main.java:164)
                  at org.jruby.Main.run(Main.java:148)
                  at org.jruby.Main.main(Main.java:128)
          {code}
          Hide
          Jonathan Hsieh added a comment -

          Adding more to of the stack trace because the region_mover.rb line numbers are helpful.

          Show
          Jonathan Hsieh added a comment - Adding more to of the stack trace because the region_mover.rb line numbers are helpful.
          Hide
          Jonathan Hsieh added a comment -

          Looks like we just need to properly catch TableNotFoundExceptions and continue.

          Also we should probably abort if it gets another kind of exception that it cannot handle.

          Separate issue should probably update the graceful shutdown script so that it fails fast on unexpected failures as well.

          Show
          Jonathan Hsieh added a comment - Looks like we just need to properly catch TableNotFoundExceptions and continue. Also we should probably abort if it gets another kind of exception that it cannot handle. Separate issue should probably update the graceful shutdown script so that it fails fast on unexpected failures as well.
          Jonathan Hsieh created issue -

            People

            • Assignee:
              Jimmy Xiang
              Reporter:
              Jonathan Hsieh
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development