Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7475

TestUpgradeFromHFileV1ToEncoding.testUpgrade hangs

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.95.2
    • Fix Version/s: 0.95.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I'm having a look. Here is the stack I got locally:

      "pool-1-thread-1" prio=10 tid=0x00007f27c8406000 nid=0xf908 in Object.wait() [0x00007f27cec5b000]
         java.lang.Thread.State: WAITING (on object monitor)
              at java.lang.Object.wait(Native Method)
              - waiting on <0x00000000e88b19b0> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread)
              at java.lang.Thread.join(Thread.java:1186)
              - locked <0x00000000e88b19b0> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread)
              at java.lang.Thread.join(Thread.java:1239)
              at org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:245)
              at org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:430)
              at org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:501)
              at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:856)
              at org.apache.hadoop.hbase.io.encoding.TestUpgradeFromHFileV1ToEncoding.testUpgrade(TestUpgradeFromHFileV1ToEncoding.java:83)
      
      "RegionServer:0;localhost,35592,1357148534219-splits-1357148554209" daemon prio=10 tid=0x0000000040ed1000 nid=0x1178 waiting on condition [0x00007f27b3d3c000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
              at java.lang.Thread.sleep(Native Method)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1164)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:966)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:919)
              at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:246)
              at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:187)
              at org.apache.hadoop.hbase.catalog.MetaReader.getHTable(MetaReader.java:198)
              at org.apache.hadoop.hbase.catalog.MetaReader.getMetaHTable(MetaReader.java:224)
              at org.apache.hadoop.hbase.catalog.MetaEditor.offlineParentInMeta(MetaEditor.java:229)
              at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:341)
              at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:471)
              at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:68)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      
      "RegionServer:0;localhost,35592,1357148534219" prio=10 tid=0x00007f27c0929000 nid=0x5a7 waiting on condition [0x00007f27be5e3000]
         java.lang.Thread.State: TIMED_WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00000000e88b1978> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
              at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1253)
              at org.apache.hadoop.hbase.regionserver.CompactSplitThread.waitFor(CompactSplitThread.java:252)
              at org.apache.hadoop.hbase.regionserver.CompactSplitThread.join(CompactSplitThread.java:261)
              at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:948)
              at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:151)
              at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:103)
              at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:135)
      
      1. 7475.addendum
        0.7 kB
        Ted Yu
      2. 7475.v1.patch
        7 kB
        Nicolas Liochon

        Issue Links

          Activity

          Hide
          stack stack added a comment -

          Marking closed.

          Show
          stack stack added a comment - Marking closed.
          Hide
          jmhsieh Jonathan Hsieh added a comment -

          Its been a month so it seems that an addendum is not the right way to go here. I've linked this issue to HBASE-7778 – let's just keep the discussion on these newly found problems over there.

          Show
          jmhsieh Jonathan Hsieh added a comment - Its been a month so it seems that an addendum is not the right way to go here. I've linked this issue to HBASE-7778 – let's just keep the discussion on these newly found problems over there.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Proposed addendum.

          There is no need to interrupt current thread at the end of cluster shutdown.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Proposed addendum. There is no need to interrupt current thread at the end of cluster shutdown.
          Hide
          jmhsieh Jonathan Hsieh added a comment -

          adding resolution fixed version.

          Show
          jmhsieh Jonathan Hsieh added a comment - adding resolution fixed version.
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #329 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/329/)
          HBASE-7475 TestUpgradeFromHFileV1ToEncoding.testUpgrade hangs (Revision 1428776)

          Result = FAILURE
          nkeywal :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestUpgradeFromHFileV1ToEncoding.java
          Show
          hudson Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #329 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/329/ ) HBASE-7475 TestUpgradeFromHFileV1ToEncoding.testUpgrade hangs (Revision 1428776) Result = FAILURE nkeywal : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestUpgradeFromHFileV1ToEncoding.java
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK #3696 (See https://builds.apache.org/job/HBase-TRUNK/3696/)
          HBASE-7475 TestUpgradeFromHFileV1ToEncoding.testUpgrade hangs (Revision 1428776)

          Result = FAILURE
          nkeywal :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestUpgradeFromHFileV1ToEncoding.java
          Show
          hudson Hudson added a comment - Integrated in HBase-TRUNK #3696 (See https://builds.apache.org/job/HBase-TRUNK/3696/ ) HBASE-7475 TestUpgradeFromHFileV1ToEncoding.testUpgrade hangs (Revision 1428776) Result = FAILURE nkeywal : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestUpgradeFromHFileV1ToEncoding.java
          Hide
          nkeywal Nicolas Liochon added a comment -

          Use EnvironmentEdge above ?

          If think it's better not: what we want to do here is to shut-down the cluster. If a test replaced the EnvironmentEdge we will be stuck.

          However, 30 seconds is a little bit too ambitious. I'm going to change this to 2 minutes.

          Thanks for the review!

          Show
          nkeywal Nicolas Liochon added a comment - Use EnvironmentEdge above ? If think it's better not: what we want to do here is to shut-down the cluster. If a test replaced the EnvironmentEdge we will be stuck. However, 30 seconds is a little bit too ambitious. I'm going to change this to 2 minutes. Thanks for the review!
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -
          +    final long maxTime = System.currentTimeMillis() + 30 * 1000;
          

          Use EnvironmentEdge above ?

          Show
          yuzhihong@gmail.com Ted Yu added a comment - + final long maxTime = System .currentTimeMillis() + 30 * 1000; Use EnvironmentEdge above ?
          Hide
          nkeywal Nicolas Liochon added a comment -

          TestHCM is unrelated. I will commit this tomorrow if nobody disagrees.
          I will also have a look at TestHCM.

          Show
          nkeywal Nicolas Liochon added a comment - TestHCM is unrelated. I will commit this tomorrow if nobody disagrees. I will also have a look at TestHCM.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563084/7475.v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestHCM

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563084/7475.v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestHCM Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3821//console This message is automatically generated.
          Hide
          nkeywal Nicolas Liochon added a comment -

          I wonder if we have more test in timeout now because of this change of the default split policy to IncreasingToUpperBoundRegionSplitPolicy.
          v1 fixes it for this specific test.

          Show
          nkeywal Nicolas Liochon added a comment - I wonder if we have more test in timeout now because of this change of the default split policy to IncreasingToUpperBoundRegionSplitPolicy. v1 fixes it for this specific test.
          Hide
          nkeywal Nicolas Liochon added a comment -

          With the default policy, IncreasingToUpperBoundRegionSplitPolicy, we're not really using HConstants.HREGION_MAX_FILESIZE to decide if we split or not.

            /**
             * @return Region max size or <code>count of regions squared * flushsize, which ever is
             * smaller; guard against there being zero regions on this server.
             */
            long getSizeToCheck(final int tableRegionsCount) {
              return tableRegionsCount == 0? getDesiredMaxFileSize():
                Math.min(getDesiredMaxFileSize(),
                  this.flushSize * (tableRegionsCount * (long)tableRegionsCount));
            }
          

          It's in the class javadoc, but it's surprising.

          Show
          nkeywal Nicolas Liochon added a comment - With the default policy, IncreasingToUpperBoundRegionSplitPolicy, we're not really using HConstants.HREGION_MAX_FILESIZE to decide if we split or not. /** * @ return Region max size or <code>count of regions squared * flushsize, which ever is * smaller; guard against there being zero regions on this server. */ long getSizeToCheck( final int tableRegionsCount) { return tableRegionsCount == 0? getDesiredMaxFileSize(): Math .min(getDesiredMaxFileSize(), this .flushSize * (tableRegionsCount * ( long )tableRegionsCount)); } It's in the class javadoc, but it's surprising.
          Hide
          nkeywal Nicolas Liochon added a comment -

          Hum. In a way, it's simple. We're doing a region split, and it seems that the test doesn't think we will split.
          In another, may be there is a real bug behind this. I'm not sure: but basically the regionserver will not stop until the split is finished. To finish it needs meta. And I guess meta is not there anymore as we're closing.

          May be we should not stop the regionserver holding meta if a split is in progress?

          Let see first if disabling the split helps.

          Show
          nkeywal Nicolas Liochon added a comment - Hum. In a way, it's simple. We're doing a region split, and it seems that the test doesn't think we will split. In another, may be there is a real bug behind this. I'm not sure: but basically the regionserver will not stop until the split is finished. To finish it needs meta. And I guess meta is not there anymore as we're closing. May be we should not stop the regionserver holding meta if a split is in progress? Let see first if disabling the split helps.

            People

            • Assignee:
              nkeywal Nicolas Liochon
              Reporter:
              nkeywal Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development