HBase
  1. HBase
  2. HBASE-4832

TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.94.0
    • Fix Version/s: 0.94.0
    • Component/s: Coprocessors, test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The current implementation of HRegionServer#stop is

        public void stop(final String msg) {
          this.stopped = true;
          LOG.info("STOPPED: " + msg);
          synchronized (this) {
            // Wakes run() if it is sleeping
            notifyAll(); // FindBugs NN_NAKED_NOTIFY
          }
        }
      

      The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is:

        public void stop(final String msg) {
          this.stopped = true;
          LOG.info("STOPPED: " + msg);
          // Wakes run() if it is sleeping
          sleeper.skipSleepCycle();
        }
      

      Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests.

      However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work.
      It likely because the code does no expect the region server to stop that fast.

      The exception is:

      testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)  Time elapsed: 30.06 sec  <<< ERROR!
      java.lang.Exception: test timed out after 30000 milliseconds
      	at java.lang.Throwable.fillInStackTrace(Native Method)
      	at java.lang.Throwable.<init>(Throwable.java:196)
      	at java.lang.Exception.<init>(Exception.java:41)
      	at java.lang.InterruptedException.<init>(InterruptedException.java:48)
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
      	at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
      	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
      	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
      	at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
      	at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
      	at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
      	at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
      	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
      	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
      	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
      	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
      	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
      	at org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
      	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
      

      We have this exception because we entered a loop of retries.

      1. 4832_trunk_hregionserver.patch
        0.7 kB
        Nicolas Liochon
      2. 4832-timeout.txt
        1.0 kB
        Ted Yu
      3. HBASE-4832.patch
        5 kB
        Eugene Koontz
      4. HBASE-4832.patch
        5 kB
        Eugene Koontz
      5. HBASE-4832.patch
        6 kB
        Eugene Koontz
      6. HBASE-4832.patch
        5 kB
        Eugene Koontz

        Issue Links

          Activity

          Hide
          Nicolas Liochon added a comment -

          4832_trunk_hregionserver.patch contains the fix on HRegionServer which makes the coprocessor test fails.

          Show
          Nicolas Liochon added a comment - 4832_trunk_hregionserver.patch contains the fix on HRegionServer which makes the coprocessor test fails.
          Hide
          Ted Yu added a comment -

          +1 on patch.

          Show
          Ted Yu added a comment - +1 on patch.
          Hide
          Eugene Koontz added a comment -

          Origin of TestRegionServerCoprocessorExceptionWithAbort test.

          Show
          Eugene Koontz added a comment - Origin of TestRegionServerCoprocessorExceptionWithAbort test.
          Hide
          Ted Yu added a comment -

          Patch which stores timeout value in a static variable.

          Show
          Ted Yu added a comment - Patch which stores timeout value in a static variable.
          Hide
          Eugene Koontz added a comment -

          New version of the patch: parameterize test timeout (thanks to Ted Yu) and use this timeout amount in Thread.sleep() near end of testExceptionFromCoprocessorDuringPut().

          Show
          Eugene Koontz added a comment - New version of the patch: parameterize test timeout (thanks to Ted Yu) and use this timeout amount in Thread.sleep() near end of testExceptionFromCoprocessorDuringPut().
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12504581/HBASE-4832.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -162 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 65 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestInstantSchemaChange
          org.apache.hadoop.hbase.client.TestAdmin

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/321//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/321//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/321//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504581/HBASE-4832.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 65 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/321//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/321//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/321//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          The test failures were due to 'Too many open files'

          Show
          Ted Yu added a comment - The test failures were due to 'Too many open files'
          Hide
          stack added a comment -

          You good w/ this nkeywal? (Thanks Eugene for hacking on this)

          Show
          stack added a comment - You good w/ this nkeywal? (Thanks Eugene for hacking on this)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12504672/4832-timeout.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -162 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.coprocessor.TestMasterObserver
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.client.TestInstantSchemaChange

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/322//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/322//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/322//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504672/4832-timeout.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestInstantSchemaChange Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/322//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/322//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/322//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12504678/HBASE-4832.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -162 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 65 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestAdmin

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/323//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/323//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/323//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504678/HBASE-4832.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 65 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/323//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/323//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/323//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          One little comment: there is a conflict between the timeout on the method (@Test(timeout=timeout)) and the timeout of the sleep (Thread.sleep(timeout)). As they're both set to the same value (30 seconds), it can be one or another so the failure analysis will be more complex. I think we can remove the timeout on the method, the test itself ensures that it won't last forever.

          Show
          Nicolas Liochon added a comment - One little comment: there is a conflict between the timeout on the method (@Test(timeout=timeout)) and the timeout of the sleep (Thread.sleep(timeout)). As they're both set to the same value (30 seconds), it can be one or another so the failure analysis will be more complex. I think we can remove the timeout on the method, the test itself ensures that it won't last forever.
          Hide
          stack added a comment -

          I can address the N comment on commit? (Removing the @Test timeout).

          Show
          stack added a comment - I can address the N comment on commit? (Removing the @Test timeout).
          Hide
          stack added a comment -

          And N, you want to uncomment this section now? This patch wants to do it.

            public void stop(final String msg) {
              this.stopped = true;
              LOG.info("STOPPED: " + msg);
              // Wakes run() if it is sleeping
              //sleeper.skipSleepCycle();
              //will be uncommented later, see discussion in jira 4798
            }
          
          Show
          stack added a comment - And N, you want to uncomment this section now? This patch wants to do it. public void stop( final String msg) { this .stopped = true ; LOG.info( "STOPPED: " + msg); // Wakes run() if it is sleeping //sleeper.skipSleepCycle(); //will be uncommented later, see discussion in jira 4798 }
          Hide
          Eugene Koontz added a comment -

          @stack, that is fine, thanks.
          -Eugene

          Show
          Eugene Koontz added a comment - @stack, that is fine, thanks. -Eugene
          Hide
          Eugene Koontz added a comment -

          @stack, I tried with "sleeper.skipSleepCycle()" uncommented and commented; test consistently succeeded 30+ iterations in both cases.

          Show
          Eugene Koontz added a comment - @stack, I tried with "sleeper.skipSleepCycle()" uncommented and commented; test consistently succeeded 30+ iterations in both cases.
          Hide
          Eugene Koontz added a comment -

          -Removes (timeout=30000) from @Test per nkeywal's suggestion.
          -Add LOG.debug() concerning where interrupt occurs.

          Show
          Eugene Koontz added a comment - -Removes (timeout=30000) from @Test per nkeywal's suggestion. -Add LOG.debug() concerning where interrupt occurs.
          Hide
          Eugene Koontz added a comment -

          git diff --no-prefix

          Show
          Eugene Koontz added a comment - git diff --no-prefix
          Hide
          Nicolas Liochon added a comment -

          fyi, the patch for the region server itself is in HBASE-4833, if the trunk changed I will update the patch.

          Show
          Nicolas Liochon added a comment - fyi, the patch for the region server itself is in HBASE-4833 , if the trunk changed I will update the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12504809/HBASE-4832.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -162 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/339//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/339//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/339//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504809/HBASE-4832.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/339//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/339//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/339//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          I believe this patch could be integrated?

          Show
          Nicolas Liochon added a comment - I believe this patch could be integrated?
          Hide
          Ted Yu added a comment -

          This incorporates nkeywal's earlier patch to this JIRA, and allows TestRegionServerCoprocessortWithAbort() to work with it. It changes the test to use a Zookeeper watcher in a separate thread to watch for the regionserver to abort. (This is also what is currently done with TestMasterCoprocessorWithAbort()).

          In Eugene's testing, repeated iterations (30+) of TestRegionServerCoprocessortWithAbort() succeed.

          Show
          Ted Yu added a comment - This incorporates nkeywal's earlier patch to this JIRA, and allows TestRegionServerCoprocessortWithAbort() to work with it. It changes the test to use a Zookeeper watcher in a separate thread to watch for the regionserver to abort. (This is also what is currently done with TestMasterCoprocessorWithAbort()). In Eugene's testing, repeated iterations (30+) of TestRegionServerCoprocessortWithAbort() succeed.
          Hide
          Ted Yu added a comment -

          Integrated to TRUNK.

          Thanks for the patch, Eugene.

          Thanks for the review Stack and N.

          Show
          Ted Yu added a comment - Integrated to TRUNK. Thanks for the patch, Eugene. Thanks for the review Stack and N.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2485 (See https://builds.apache.org/job/HBase-TRUNK/2485/)
          HBASE-4832 TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

          tedyu :
          Files :

          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2485 (See https://builds.apache.org/job/HBase-TRUNK/2485/ ) HBASE-4832 TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast tedyu : Files : /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
          Hide
          Eugene Koontz added a comment -

          Thanks to Ted for commit, Stack for reviews and nkeywal for filing and helping diagnose these problems with this test.

          Show
          Eugene Koontz added a comment - Thanks to Ted for commit, Stack for reviews and nkeywal for filing and helping diagnose these problems with this test.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #11 (See https://builds.apache.org/job/HBase-TRUNK-security/11/)
          HBASE-4832 TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

          tedyu :
          Files :

          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #11 (See https://builds.apache.org/job/HBase-TRUNK-security/11/ ) HBASE-4832 TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast tedyu : Files : /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java

            People

            • Assignee:
              Eugene Koontz
              Reporter:
              Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development