Whirr
  1. Whirr
  2. WHIRR-240

[HBase] Enable support for HBase 0.90.x

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.6.0
    • Component/s: service/hbase
    • Labels:
      None

      Description

      HBase 0.90.0 is a difficult release as it either needs CDH or a patched Hadoop (with append) to work. The Apache tarballs won't do and HBase will not start.

      One way possible is to deploy the Apache Hadoop 0.20.2 tarball and then override the core jar with the one supplied by HBase. Since HBase relies on Hadoop to be setup by the Whirr service we would need some surgery that would imply service ordering.

      1. WHIRR-240.patch
        2 kB
        Bruno Dumon
      2. hbase-ec2-090.properties
        2 kB
        Bruno Dumon
      3. WHIRR-240-tests.patch
        10 kB
        Bruno Dumon
      4. WHIRR-240.patch
        14 kB
        Bruno Dumon

        Activity

        Hide
        Bruno Dumon added a comment -

        I added a patch which should solve the problem.

        It copies (actually links) the hadoop-core jar of the installed Hadoop version to the HBase lib dir, which is the opposite of what was suggested in the description of this patch. Cloudera does exactly the same in their Linux packages.

        This solution requires that Hadoop is installed on each node where HBase is installed. Usually this is the case (hadoop-namenode+hbase-master and hadoop-datanode+hbase-regionserver).

        The usual suggestion in the HBase community is to use the CDH Hadoop version. You can combine stock HBase with CDH Hadoop by using the following properties:

        whirr.hbase.tarball.url=http://apache.cu.be//hbase/hbase-0.90.3/hbase-0.90.3.tar.gz
        whirr.hadoop.tarball.url=http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u0.tar.gz

        Note that this uses the .tar.gz release of Cloudera, and not Whirr's special cdh support which uses the Linux packages instead. I do it that way because the CDH linux packages use different conventions from Whirr (e.g. different Linux users).

        I tried this on a small byon cluster and it seems to work fine. It doesn't seem to need the 'wait for namenode' loop before starting hbase as suggested in WHIRR-334, though it might be that this is due to byon timing differences.

        Show
        Bruno Dumon added a comment - I added a patch which should solve the problem. It copies (actually links) the hadoop-core jar of the installed Hadoop version to the HBase lib dir, which is the opposite of what was suggested in the description of this patch. Cloudera does exactly the same in their Linux packages. This solution requires that Hadoop is installed on each node where HBase is installed. Usually this is the case (hadoop-namenode+hbase-master and hadoop-datanode+hbase-regionserver). The usual suggestion in the HBase community is to use the CDH Hadoop version. You can combine stock HBase with CDH Hadoop by using the following properties: whirr.hbase.tarball.url= http://apache.cu.be//hbase/hbase-0.90.3/hbase-0.90.3.tar.gz whirr.hadoop.tarball.url= http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u0.tar.gz Note that this uses the .tar.gz release of Cloudera, and not Whirr's special cdh support which uses the Linux packages instead. I do it that way because the CDH linux packages use different conventions from Whirr (e.g. different Linux users). I tried this on a small byon cluster and it seems to work fine. It doesn't seem to need the 'wait for namenode' loop before starting hbase as suggested in WHIRR-334 , though it might be that this is due to byon timing differences.
        Hide
        Andrei Savu added a comment -

        Bruno thanks for taking the time to contribute this patch.

        We should probably also create a recipe for this that would contain the relevant URLs:

        whirr.hbase.tarball.url=http://apache.cu.be//hbase/hbase-0.90.3/hbase-0.90.3.tar.gz
        whirr.hadoop.tarball.url=http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u0.tar.gz
        

        I will look more into that timing issue tonight.

        Show
        Andrei Savu added a comment - Bruno thanks for taking the time to contribute this patch. We should probably also create a recipe for this that would contain the relevant URLs: whirr.hbase.tarball.url=http://apache.cu.be//hbase/hbase-0.90.3/hbase-0.90.3.tar.gz whirr.hadoop.tarball.url=http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u0.tar.gz I will look more into that timing issue tonight.
        Hide
        Bruno Dumon added a comment -

        Indeed, we need a recipe, something like "hbase-ec2-090.properties". Should be trivial to make, but maybe someone else can double-check this patch (I had trouble testing it on ec2 yesterday, cfr. the boostrap phase timeout problems).

        Since there was some discussion on the users list as to whether this jar-replacing is a good approach, I'd like to throw in my arguments for adding this:

        • most importantly, HBase users currently looking into Whirr will be disappointed as they can't run any recent version
        • from Andrew's comment, I take that even with HBase 0.92, people might want to run different Hadoop versions (e.g. CDH), which will likely still require replacing the hadoop-core jar in HBase, for example because of differences in the RPC protocol.

        Besides this, it is 'the' approach in HBase land, cfr http://hbase.apache.org/book/hadoop.html: 'Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues'.

        Show
        Bruno Dumon added a comment - Indeed, we need a recipe, something like "hbase-ec2-090.properties". Should be trivial to make, but maybe someone else can double-check this patch (I had trouble testing it on ec2 yesterday, cfr. the boostrap phase timeout problems). Since there was some discussion on the users list as to whether this jar-replacing is a good approach, I'd like to throw in my arguments for adding this: most importantly, HBase users currently looking into Whirr will be disappointed as they can't run any recent version from Andrew's comment, I take that even with HBase 0.92, people might want to run different Hadoop versions (e.g. CDH), which will likely still require replacing the hadoop-core jar in HBase, for example because of differences in the RPC protocol. Besides this, it is 'the' approach in HBase land, cfr http://hbase.apache.org/book/hadoop.html: 'Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues'.
        Hide
        Bruno Dumon added a comment -

        Added hbase-ec2-090 recipe. As EC2 is working smooth for me know, I was able to test it. Only difference is that I tested with ubuntu 11.04 in eu-west (shouldn't make a difference).

        HBase master & region server are running, hbase master web ui shows the region servers.

        Initially during startup there were some errors of the following kind (in the master's log):

        2011-07-14 15:16:51,115 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1
                at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1469)
                at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        

        presumably because the datanodes were not available yet (seems like a reason to have the ordered service startup after all).

        I didn't put an actual workload against it, but created a table and saw no further errors in the logs (checked on the actual regionserver that hosted the table, it was able to create its dfs files fine).

        Show
        Bruno Dumon added a comment - Added hbase-ec2-090 recipe. As EC2 is working smooth for me know, I was able to test it. Only difference is that I tested with ubuntu 11.04 in eu-west (shouldn't make a difference). HBase master & region server are running, hbase master web ui shows the region servers. Initially during startup there were some errors of the following kind (in the master's log): 2011-07-14 15:16:51,115 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1469) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) presumably because the datanodes were not available yet (seems like a reason to have the ordered service startup after all). I didn't put an actual workload against it, but created a table and saw no further errors in the logs (checked on the actual regionserver that hosted the table, it was able to create its dfs files fine).
        Hide
        Tom White added a comment -

        Does this still support 0.89 HBase?

        Show
        Tom White added a comment - Does this still support 0.89 HBase?
        Hide
        Bruno Dumon added a comment -

        I had not arrived at trying the integration tests before (sorry), I'm looking into it now.

        I identified the following problems:

        • whirr.instance-templates: the thriftserver was run on a node on which there is no hadoop. This does not work as then we can't copy over the hadoop-core jar. This will also be a problem for WHIRR-339 (hbase-site.xml generation), I'll add a comment over there. The 'solution' is to adjust the template, I think this is fair enough since there are other limitations too on how templates must be structured.
        • Whirr decides to first configure the node with the regionserver, and after that the one with zookeeper. This fails because the regionserver will only wait/retry a limited amount of time to connect to ZooKeeper on startup (20s, I checked in the logs that zookeeper actually started after the regionserver exited, about 10s later). Luckily, there is a property to control this: hbase.zookeeper.recoverable.waittime. I put it to 5 minutes (this prop only has effect on startup AFAICS).

        With these changes, the test runs successfully for me.

        To make the tests work for both 0.89 and 0.90, I had to add an additional test class and properties file, the name of the properties file is passed to HBaseServiceController.getInstance(). Let me know if you prefer to have it done some other way.

        Note that, in services/hbase/pom.xml, I left the hbase.version property on 0.89etcetera which does work for both since it uses the thrift interface which seems to be compatible.

        The tests for 0.89 still run, but since copying hadoop-core is done regardless of the HBase version, the above mentioned limitation of needing to have Hadoop installed on the node with thrift now also applies to 0.89.

        The hbase.zookeeper.recoverable.waittime is specified in the cluster configuration, which depends on WHIRR-339. Without WHIRR-339, it needs to be hardcoded in the configure_(cdh_)hbase.sh

        Through the properties passed with -Dconfig to the integration test, I supplied the following:

        whirr.image-id=eu-west-1/ami-619ea915 (canonical 11.04 instance store EU)
        whirr.hardware-id=m1.large
        whirr.location-id=eu-west-1a

        Show
        Bruno Dumon added a comment - I had not arrived at trying the integration tests before (sorry), I'm looking into it now. I identified the following problems: whirr.instance-templates: the thriftserver was run on a node on which there is no hadoop. This does not work as then we can't copy over the hadoop-core jar. This will also be a problem for WHIRR-339 (hbase-site.xml generation), I'll add a comment over there. The 'solution' is to adjust the template, I think this is fair enough since there are other limitations too on how templates must be structured. Whirr decides to first configure the node with the regionserver, and after that the one with zookeeper. This fails because the regionserver will only wait/retry a limited amount of time to connect to ZooKeeper on startup (20s, I checked in the logs that zookeeper actually started after the regionserver exited, about 10s later). Luckily, there is a property to control this: hbase.zookeeper.recoverable.waittime. I put it to 5 minutes (this prop only has effect on startup AFAICS). With these changes, the test runs successfully for me. To make the tests work for both 0.89 and 0.90, I had to add an additional test class and properties file, the name of the properties file is passed to HBaseServiceController.getInstance(). Let me know if you prefer to have it done some other way. Note that, in services/hbase/pom.xml, I left the hbase.version property on 0.89etcetera which does work for both since it uses the thrift interface which seems to be compatible. The tests for 0.89 still run, but since copying hadoop-core is done regardless of the HBase version, the above mentioned limitation of needing to have Hadoop installed on the node with thrift now also applies to 0.89. The hbase.zookeeper.recoverable.waittime is specified in the cluster configuration, which depends on WHIRR-339 . Without WHIRR-339 , it needs to be hardcoded in the configure_(cdh_)hbase.sh Through the properties passed with -Dconfig to the integration test, I supplied the following: whirr.image-id=eu-west-1/ami-619ea915 (canonical 11.04 instance store EU) whirr.hardware-id=m1.large whirr.location-id=eu-west-1a
        Hide
        Bruno Dumon added a comment -

        Merge the various aspects of the previous patches and test with current trunk.

        This patch includes:

        • a change to configure_hbase.sh to replace HBase's hadoop-core jar with the one of the acutally installed hadoop
        • a sample recipe
        • separate integration tests for HBase 0.89 and HBase 0.90

        Ran integration tests with all settings default (except whirr.hardware-id=m1.large) and tested the recipe manually.

        Show
        Bruno Dumon added a comment - Merge the various aspects of the previous patches and test with current trunk. This patch includes: a change to configure_hbase.sh to replace HBase's hadoop-core jar with the one of the acutally installed hadoop a sample recipe separate integration tests for HBase 0.89 and HBase 0.90 Ran integration tests with all settings default (except whirr.hardware-id=m1.large) and tested the recipe manually.
        Hide
        Andrei Savu added a comment -

        +1 tested on EC2. Thanks Bruno!

        Show
        Andrei Savu added a comment - +1 tested on EC2. Thanks Bruno!
        Hide
        Andrei Savu added a comment -

        I've just committed this. Thanks Bruno! Integration tests also work on cloudservers. We can fix any remaining issues or make changes in new JIRAs.

        Show
        Andrei Savu added a comment - I've just committed this. Thanks Bruno! Integration tests also work on cloudservers. We can fix any remaining issues or make changes in new JIRAs.

          People

          • Assignee:
            Bruno Dumon
            Reporter:
            Lars George
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development