Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.4.0
    • Component/s: service/hbase
    • Labels:
      None

      Description

      I get the following stacktrace consistently on EC2:

      java.lang.NullPointerException
              at org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher.getZooKeeperWrapper(HConnectionManager.java:231)
              at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getZooKeeperWrapper(HConnectionManager.java:1048)
              at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:1064)
              at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:668)
              at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:644)
              at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:770)
              at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:673)
              at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:644)
              at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:136)
              at org.apache.whirr.service.hbase.integration.HBaseServiceController.waitForMaster(HBaseServiceController.java:104)
              at org.apache.whirr.service.hbase.integration.HBaseServiceController.startup(HBaseServiceController.java:86)
              at org.apache.whirr.service.hbase.integration.HBaseServiceController.ensureClusterRunning(HBaseServiceController.java:66)
              at org.apache.whirr.service.hbase.integration.HBaseServiceTest.setUp(HBaseServiceTest.java:45)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
              at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
              at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
              at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
              at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
              at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
              at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
              at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115)
              at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:102)
              at org.apache.maven.surefire.Surefire.run(Surefire.java:180)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
              at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
      
      1. WHIRR-201-v6.patch
        9 kB
        Lars George
      2. WHIRR-201-v4.patch
        9 kB
        Lars George
      3. WHIRR-201-v3.patch
        10 kB
        Lars George
      4. WHIRR-201-v2.patch
        9 kB
        Lars George
      5. WHIRR-201.patch
        8 kB
        Lars George

        Issue Links

          Activity

          Hide
          Andrei Savu added a comment -

          Same error on Rackspace.

          Show
          Andrei Savu added a comment - Same error on Rackspace.
          Hide
          Tom White added a comment -

          Lars tells me that the test is failing even though the HBase service is working fine. If it is the case that the test is the problem, and no fix is forthcoming, then we could release in this state.

          Show
          Tom White added a comment - Lars tells me that the test is failing even though the HBase service is working fine. If it is the case that the test is the problem, and no fix is forthcoming, then we could release in this state.
          Hide
          Andrei Savu added a comment -

          I am seeing the following error log messages before the cluster is destroyed:

          2011-01-19 08:05:00,542 INFO  [org.apache.whirr.service.hbase.integration.HBaseServiceController] (main) Waiting for master...
          2011-01-19 08:05:00,673 ERROR [org.apache.hadoop.hbase.zookeeper.HQuorumPeer] (main) no clientPort found in zoo.cfg
          2011-01-19 08:05:00,673 ERROR [org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper] (main)
          <org.apache.hadoop.hbase.client.HConnectionManager>Error creating a ZooKeeperWrapper java.io.IOException:
           Could not read quorum servers from zoo.cfg
          
          Show
          Andrei Savu added a comment - I am seeing the following error log messages before the cluster is destroyed: 2011-01-19 08:05:00,542 INFO [org.apache.whirr.service.hbase.integration.HBaseServiceController] (main) Waiting for master... 2011-01-19 08:05:00,673 ERROR [org.apache.hadoop.hbase.zookeeper.HQuorumPeer] (main) no clientPort found in zoo.cfg 2011-01-19 08:05:00,673 ERROR [org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper] (main) <org.apache.hadoop.hbase.client.HConnectionManager>Error creating a ZooKeeperWrapper java.io.IOException: Could not read quorum servers from zoo.cfg
          Hide
          Andrei Savu added a comment -

          I suspect that the local hbase-site.xml is never read. I will investigate more later.

          Show
          Andrei Savu added a comment - I suspect that the local hbase-site.xml is never read. I will investigate more later.
          Hide
          Andrei Savu added a comment -

          I believe I have finally made the test work by doing this change:

          diff --git a/services/hbase/src/test/java/org/apache/whirr/service/hbase/integration/HBaseServiceController.java b/services/hbase/src/test/java/org/apache/whirr/service/hbase/integration/HBaseServiceController.java
          index 0fab9a0..d473fa1 100644
          --- a/services/hbase/src/test/java/org/apache/whirr/service/hbase/integration/HBaseServiceController.java
          +++ b/services/hbase/src/test/java/org/apache/whirr/service/hbase/integration/HBaseServiceController.java
          @@ -81,6 +81,8 @@ public class HBaseServiceController {
               proxy.start();
          
               Configuration conf = getConfiguration();
          +    conf.set("hbase.zookeeper.property.clientPort", "2181");
          +    conf.set("clientPort", "2181");
               waitForMaster(conf);
               running = true;
             }
          

          but unfortunately it's still failing with a different error:

          -------------------------------------------------------------------------------
          Test set: org.apache.whirr.service.hbase.integration.HBaseServiceTest
          -------------------------------------------------------------------------------
          Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1,022.614 sec <<< FAILURE!
          org.apache.whirr.service.hbase.integration.HBaseServiceTest  Time elapsed: 0 sec  <<< ERROR!
          org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region because: org.apache.hadoop.security.UserGroupInformation.getCurrentUser()Lorg/apache/hadoop/security/UserGroupInformation;
            at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:1107)
            at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:668)
            at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:644)
            at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:770)
            at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:673)
            at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:644)
            at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:136)
            at org.apache.whirr.service.hbase.integration.HBaseServiceController.waitForMaster(HBaseServiceController.java:104)
          

          I think we should fix this before doing a release. The failure may signal a deeper problem.

          Show
          Andrei Savu added a comment - I believe I have finally made the test work by doing this change: diff --git a/services/hbase/src/test/java/org/apache/whirr/service/hbase/integration/HBaseServiceController.java b/services/hbase/src/test/java/org/apache/whirr/service/hbase/integration/HBaseServiceController.java index 0fab9a0..d473fa1 100644 --- a/services/hbase/src/test/java/org/apache/whirr/service/hbase/integration/HBaseServiceController.java +++ b/services/hbase/src/test/java/org/apache/whirr/service/hbase/integration/HBaseServiceController.java @@ -81,6 +81,8 @@ public class HBaseServiceController { proxy.start(); Configuration conf = getConfiguration(); + conf.set( "hbase.zookeeper.property.clientPort" , "2181" ); + conf.set( "clientPort" , "2181" ); waitForMaster(conf); running = true ; } but unfortunately it's still failing with a different error: ------------------------------------------------------------------------------- Test set: org.apache.whirr.service.hbase.integration.HBaseServiceTest ------------------------------------------------------------------------------- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1,022.614 sec <<< FAILURE! org.apache.whirr.service.hbase.integration.HBaseServiceTest Time elapsed: 0 sec <<< ERROR! org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region because: org.apache.hadoop.security.UserGroupInformation.getCurrentUser()Lorg/apache/hadoop/security/UserGroupInformation; at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:1107) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:668) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:644) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:770) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:673) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:644) at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:136) at org.apache.whirr.service.hbase.integration.HBaseServiceController.waitForMaster(HBaseServiceController.java:104) I think we should fix this before doing a release. The failure may signal a deeper problem.
          Hide
          Andrei Savu added a comment -

          +1 for fixing this in 0.4.0. I have tested the HBase cluster by running YCSB on the servers and everything worked as expected.

          Show
          Andrei Savu added a comment - +1 for fixing this in 0.4.0. I have tested the HBase cluster by running YCSB on the servers and everything worked as expected.
          Show
          Tom White added a comment - Thanks for running YCSB, Andrei. BTW have you seen http://search-hadoop.com/m/29uW71lgVAq1/YCSB+tests+for+HBase+on+Whirr&subj=YCSB+tests+for+HBase+on+Whirr+was+Report+to+Apache+board+first+cut+
          Hide
          Andrei Savu added a comment -

          No. Awesome stuff!

          Show
          Andrei Savu added a comment - No. Awesome stuff!
          Hide
          Lars George added a comment -

          Darn, I am missing these updates all the time! The seem to be emailed somewhere else.

          Anyhow, so yes, the issue is https://issues.apache.org/jira/browse/HBASE-3143 where the test hbase-site-xml is loaded (from hbase-test.jar) which contains a ZooKeeper port of 21810 (as opposed to 2181). So your "hack" Andrei is correct, I have done the same to proceed. Currently I am facing http://search-hadoop.com/m/sPdqNFAwyg2 where the region server is reported with the internal EC2 addresses. So doing this fails:

          $ HBASE_CONF_DIR=~/.whirr/hbaseclustertest/ ~/projects/opensource/hbase-0.89.20100924/bin/hbase shell
          HBase Shell; enter 'help<RETURN>' for list of supported commands.
          Type "exit<RETURN>" to leave the HBase Shell
          Version: 0.89.20100924, r1001068, Tue Oct  5 12:12:44 PDT 2010
          
          hbase(main):001:0> list
          TABLE                                                                                                                                                                      
          11/01/25 14:22:15 ERROR hbase.HServerAddress: Could not resolve the DNS name of ip-10-114-145-167.ec2.internal:60020
          11/01/25 14:22:16 ERROR hbase.HServerAddress: Could not resolve the DNS name of ip-10-114-145-167.ec2.internal:60020
          11/01/25 14:22:17 ERROR hbase.HServerAddress: Could not resolve the DNS name of ip-10-114-145-167.ec2.internal:60020
          ...
          

          This also makes the test fail as it cannot talk to the server serving -ROOT-.

          But otherwise the clusters work as you noted, but the test needs to warp the IPs from local to external, or add the remote ones to the internal DNS lookup service. Is that possible at all?

          Show
          Lars George added a comment - Darn, I am missing these updates all the time! The seem to be emailed somewhere else. Anyhow, so yes, the issue is https://issues.apache.org/jira/browse/HBASE-3143 where the test hbase-site-xml is loaded (from hbase-test.jar) which contains a ZooKeeper port of 21810 (as opposed to 2181). So your "hack" Andrei is correct, I have done the same to proceed. Currently I am facing http://search-hadoop.com/m/sPdqNFAwyg2 where the region server is reported with the internal EC2 addresses. So doing this fails: $ HBASE_CONF_DIR=~/.whirr/hbaseclustertest/ ~/projects/opensource/hbase-0.89.20100924/bin/hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version: 0.89.20100924, r1001068, Tue Oct 5 12:12:44 PDT 2010 hbase(main):001:0> list TABLE 11/01/25 14:22:15 ERROR hbase.HServerAddress: Could not resolve the DNS name of ip-10-114-145-167.ec2.internal:60020 11/01/25 14:22:16 ERROR hbase.HServerAddress: Could not resolve the DNS name of ip-10-114-145-167.ec2.internal:60020 11/01/25 14:22:17 ERROR hbase.HServerAddress: Could not resolve the DNS name of ip-10-114-145-167.ec2.internal:60020 ... This also makes the test fail as it cannot talk to the server serving -ROOT-. But otherwise the clusters work as you noted, but the test needs to warp the IPs from local to external, or add the remote ones to the internal DNS lookup service. Is that possible at all?
          Hide
          Lars George added a comment -

          Talking to Tom and looking at the options Andy presents I will change the test to start a REST server and ping the tables that way.

          Show
          Lars George added a comment - Talking to Tom and looking at the options Andy presents I will change the test to start a REST server and ping the tables that way.
          Hide
          Lars George added a comment -

          Patch switches to use REST since otherwise traversal from a test machine to the cluster is too difficult to achieve (see above notes).

          Show
          Lars George added a comment - Patch switches to use REST since otherwise traversal from a test machine to the cluster is too difficult to achieve (see above notes).
          Hide
          Lars George added a comment -

          Adds SOCKS factory details to HBase configuration, does not help with everything but should be there nevertheless.

          Show
          Lars George added a comment - Adds SOCKS factory details to HBase configuration, does not help with everything but should be there nevertheless.
          Hide
          Lars George added a comment -

          v3 patch adds missing pom.xml deps.

          Show
          Lars George added a comment - v3 patch adds missing pom.xml deps.
          Hide
          Lars George added a comment -

          v4 patch adds missing changes to config, sorry for the flood but I messed up my local commits and am backtracking.

          Show
          Lars George added a comment - v4 patch adds missing changes to config, sorry for the flood but I messed up my local commits and am backtracking.
          Hide
          Lars George added a comment -

          v6 should be the patch to use, I ran it against trunk and the integration test succeeds.

          ./services/hbase [svn:]$ mvn verify -Pintegration -DargLine="-Dwhirr.test.provider=ec2 -Dwhirr.test.identity=$AWS_ACCESS_KEY_ID -Dwhirr.test.credential=$AWS_SECRET_ACCESS_KEY"
          [INFO] Scanning for projects...
          [INFO] ------------------------------------------------------------------------
          [INFO] Building Apache Whirr HBase
          [INFO]    task-segment: [verify]
          [INFO] ------------------------------------------------------------------------
          ...
          2011-01-28 12:08:22,678 INFO  [org.apache.whirr.service.hbase.integration.HBaseServiceController] (main) Waiting for master...
          Warning: Permanently added 'ec2-50-16-38-27.compute-1.amazonaws.com,50.16.38.27' (RSA) to the list of known hosts.
          2011-01-28 12:08:33,256 INFO  [org.apache.whirr.service.hbase.integration.HBaseServiceController] (main) Master reported in. Continuing.
          Done.
          2011-01-28 12:08:35,571 INFO  [org.apache.whirr.service.hbase.integration.HBaseServiceController] (main) Shutting down cluster...
          2011-01-28 12:08:35,573 INFO  [org.apache.whirr.cluster.actions.DestroyClusterAction] (main) Destroying hbaseclustertest cluster
          2011-01-28 12:09:03,497 INFO  [org.apache.whirr.cluster.actions.DestroyClusterAction] (main) Cluster hbaseclustertest destroyed
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 378.75 sec
          
          Results :
          
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
          
          [WARNING] File encoding has not been set, using platform encoding MacRoman, i.e. build is platform dependent!
          [INFO] [failsafe:verify {execution: verify}]
          [INFO] Failsafe report directory: /Users/larsgeorge/projects/opensource/whirr-asf-trunk-rw/services/hbase/target/failsafe-reports
          [WARNING] File encoding has not been set, using platform encoding MacRoman, i.e. build is platform dependent!
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESSFUL
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 6 minutes 39 seconds
          [INFO] Finished at: Fri Jan 28 12:09:03 CET 2011
          [INFO] Final Memory: 64M/125M
          [INFO] ------------------------------------------------------------------------
          
          Show
          Lars George added a comment - v6 should be the patch to use, I ran it against trunk and the integration test succeeds. ./services/hbase [svn:]$ mvn verify -Pintegration -DargLine= "-Dwhirr.test.provider=ec2 -Dwhirr.test.identity=$AWS_ACCESS_KEY_ID -Dwhirr.test.credential=$AWS_SECRET_ACCESS_KEY" [INFO] Scanning for projects... [INFO] ------------------------------------------------------------------------ [INFO] Building Apache Whirr HBase [INFO] task-segment: [verify] [INFO] ------------------------------------------------------------------------ ... 2011-01-28 12:08:22,678 INFO [org.apache.whirr.service.hbase.integration.HBaseServiceController] (main) Waiting for master... Warning: Permanently added 'ec2-50-16-38-27.compute-1.amazonaws.com,50.16.38.27' (RSA) to the list of known hosts. 2011-01-28 12:08:33,256 INFO [org.apache.whirr.service.hbase.integration.HBaseServiceController] (main) Master reported in. Continuing. Done. 2011-01-28 12:08:35,571 INFO [org.apache.whirr.service.hbase.integration.HBaseServiceController] (main) Shutting down cluster... 2011-01-28 12:08:35,573 INFO [org.apache.whirr.cluster.actions.DestroyClusterAction] (main) Destroying hbaseclustertest cluster 2011-01-28 12:09:03,497 INFO [org.apache.whirr.cluster.actions.DestroyClusterAction] (main) Cluster hbaseclustertest destroyed Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 378.75 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 [WARNING] File encoding has not been set, using platform encoding MacRoman, i.e. build is platform dependent! [INFO] [failsafe:verify {execution: verify}] [INFO] Failsafe report directory: /Users/larsgeorge/projects/opensource/whirr-asf-trunk-rw/services/hbase/target/failsafe-reports [WARNING] File encoding has not been set, using platform encoding MacRoman, i.e. build is platform dependent! [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESSFUL [INFO] ------------------------------------------------------------------------ [INFO] Total time: 6 minutes 39 seconds [INFO] Finished at: Fri Jan 28 12:09:03 CET 2011 [INFO] Final Memory: 64M/125M [INFO] ------------------------------------------------------------------------
          Hide
          Tom White added a comment -

          I ran the test and it passed for me. I've just committed this (with a few minor imports changed to satisfy checkstyle). Thanks, Lars!

          Show
          Tom White added a comment - I ran the test and it passed for me. I've just committed this (with a few minor imports changed to satisfy checkstyle). Thanks, Lars!

            People

            • Assignee:
              Lars George
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development