Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.3.0
    • Component/s: service/hbase
    • Labels:
      None
    1. WHIRR-25.patch
      56 kB
      Lars George

      Issue Links

        Activity

        Hide
        Tom White added a comment -

        I've just committed this. Thanks Lars!

        Show
        Tom White added a comment - I've just committed this. Thanks Lars!
        Hide
        Lars George added a comment -

        Here's a patch for WHIRR-25. I agree, we can get this in and then test it thoroughly and open new issues against it. I hope the patch format is OK, my git-fu is still weak.

        Show
        Lars George added a comment - Here's a patch for WHIRR-25 . I agree, we can get this in and then test it thoroughly and open new issues against it. I hope the patch format is OK, my git-fu is still weak.
        Hide
        Jeff Hammerbacher added a comment -

        Yes, I'd love to play around with a patch for Christmas!

        Show
        Jeff Hammerbacher added a comment - Yes, I'd love to play around with a patch for Christmas!
        Hide
        Tom White added a comment -

        > ... this is working fine now ...

        Great! How about putting a patch up which can be checked in, then the follow on work can go into separate JIRAs. Does that sound reasonable?

        Show
        Tom White added a comment - > ... this is working fine now ... Great! How about putting a patch up which can be checked in, then the follow on work can go into separate JIRAs. Does that sound reasonable?
        Hide
        Lars George added a comment -

        OK, I added WHIRR-174 (and WHIRR-175 in the process), both fix the stand-alone ZooKeeper setup and this is working fine now using for example a config like

        whirr.cluster-name=hbasetest
        whirr.instance-templates=1 zk,1 jt+nn+hbase-master,5 dn+tt+hbase-regionserver
        whirr.provider=ec2
        whirr.identity=...
        whirr.credential=...
        whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
        whirr.hardware-id=m1.large
        whirr.image-id=us-east-1/ami-da0cf8b3
        whirr.location-id=us-east-1
        whirr.run-url-base=http://testwhirr.s3.amazonaws.com/whirr-trunk/
        

        Note: The "testwhirr" bucket is holding the content of the local "script/" directory.

        Next up is ZK managed by HBase, which is done like so:

        • Check if there is no "zk" role in the template -> set HBaseManagesZK = true
        • Emit a zoo.cfg from the HBase post-configure script next to the hbase-site.xml
        • Set the option in hbase-env.sh to indicate that HBase manages ZK:
        HBASE_MANAGES_ZK=true
        Show
        Lars George added a comment - OK, I added WHIRR-174 (and WHIRR-175 in the process), both fix the stand-alone ZooKeeper setup and this is working fine now using for example a config like whirr.cluster-name=hbasetest whirr.instance-templates=1 zk,1 jt+nn+hbase-master,5 dn+tt+hbase-regionserver whirr.provider=ec2 whirr.identity=... whirr.credential=... whirr. private -key-file=${sys:user.home}/.ssh/id_rsa whirr.hardware-id=m1.large whirr.image-id=us-east-1/ami-da0cf8b3 whirr.location-id=us-east-1 whirr.run-url-base=http: //testwhirr.s3.amazonaws.com/whirr-trunk/ Note: The "testwhirr" bucket is holding the content of the local "script/" directory. Next up is ZK managed by HBase, which is done like so: Check if there is no "zk" role in the template -> set HBaseManagesZK = true Emit a zoo.cfg from the HBase post-configure script next to the hbase-site.xml Set the option in hbase-env.sh to indicate that HBase manages ZK: HBASE_MANAGES_ZK= true
        Hide
        Lars George added a comment -

        As per Tom's suggestions

        Setting port numbers is being discussed in WHIRR-168. You might look there and WHIRR-55 for naming conventions that are compatible with the approaches being taken in those JIRAs.

        Done.

        DnsUtil is in core now so you can use that rather than a copy.

        Removed and global one referenced.

        Can we share code between HadoopProxy and HBase proxy? They seem to be almost the same.

        Removed mine and am using the HadoopProxy.

        Nit: In BasicServerClusterActionHandler make the instance variables final?

        Done

        Show
        Lars George added a comment - As per Tom's suggestions Setting port numbers is being discussed in WHIRR-168 . You might look there and WHIRR-55 for naming conventions that are compatible with the approaches being taken in those JIRAs. Done. DnsUtil is in core now so you can use that rather than a copy. Removed and global one referenced. Can we share code between HadoopProxy and HBase proxy? They seem to be almost the same. Removed mine and am using the HadoopProxy. Nit: In BasicServerClusterActionHandler make the instance variables final? Done
        Hide
        Tom White added a comment -

        This looks great so far. A few quick, minor comments:

        • Setting port numbers is being discussed in WHIRR-168. You might look there and WHIRR-55 for naming conventions that are compatible with the approaches being taken in those JIRAs.
        • DnsUtil is in core now so you can use that rather than a copy.
        • Can we share code between HadoopProxy and HBase proxy? They seem to be almost the same.
        • Nit: In BasicServerClusterActionHandler make the instance variables final?
        Show
        Tom White added a comment - This looks great so far. A few quick, minor comments: Setting port numbers is being discussed in WHIRR-168 . You might look there and WHIRR-55 for naming conventions that are compatible with the approaches being taken in those JIRAs. DnsUtil is in core now so you can use that rather than a copy. Can we share code between HadoopProxy and HBase proxy? They seem to be almost the same. Nit: In BasicServerClusterActionHandler make the instance variables final?
        Hide
        Lars George added a comment - - edited

        Know issues or things to test:

        • Stand-alone ZooKeeper support

        The zk scripts need some additions to check how many servers in the ensemble are handed in and then either create a simplified standalone zoo.cfg or the distributed one as is done now all the time.

        • HBase manages ZK

        We need to detect when no ZK role was specified and then have HBase start it implicitly. We may even add specific roles (hbase-zookeeper) to indicate this explicitly.

        • Run Rest/Thrift/Avro Server on other HBase machines

        Check that the scripts handle the combined setup fine.

        • Ganglia Setup

        I have currently commented out a few lines that were used by the hbase-ec2 script, which I used as a template. We should check and either remove or enable them.

        Show
        Lars George added a comment - - edited Know issues or things to test: Stand-alone ZooKeeper support The zk scripts need some additions to check how many servers in the ensemble are handed in and then either create a simplified standalone zoo.cfg or the distributed one as is done now all the time. HBase manages ZK We need to detect when no ZK role was specified and then have HBase start it implicitly. We may even add specific roles (hbase-zookeeper) to indicate this explicitly. Run Rest/Thrift/Avro Server on other HBase machines Check that the scripts handle the combined setup fine. Ganglia Setup I have currently commented out a few lines that were used by the hbase-ec2 script, which I used as a template. We should check and either remove or enable them.
        Hide
        Lars George added a comment -

        Slowly making progress: https://github.com/larsgeorge/whirr/commits/trunk

        Thanks to Tom's recent cleanups and improvements this is really straight forward and I am relying on the service composition feature to make this all work. Tests next.

        Show
        Lars George added a comment - Slowly making progress: https://github.com/larsgeorge/whirr/commits/trunk Thanks to Tom's recent cleanups and improvements this is really straight forward and I am relying on the service composition feature to make this all work. Tests next.
        Hide
        Tom White added a comment -

        Also, configuration using the approach in WHIRR-55 provides more flexibility.

        Show
        Tom White added a comment - Also, configuration using the approach in WHIRR-55 provides more flexibility.
        Hide
        Tom White added a comment -

        BTW when the work in WHIRR-117 is done it should be a lot easier to add HBase, since the ZooKeeper and Hadoop roles can be composed, and the amount of code needed to implement a new service will be much reduced.

        Show
        Tom White added a comment - BTW when the work in WHIRR-117 is done it should be a lot easier to add HBase, since the ZooKeeper and Hadoop roles can be composed, and the amount of code needed to implement a new service will be much reduced.
        Hide
        Tom White added a comment -

        Thanks for working on this Lars. Perhaps post your progress so far in case someone else wants to have a look? (I'm certainly interested.)

        Show
        Tom White added a comment - Thanks for working on this Lars. Perhaps post your progress so far in case someone else wants to have a look? (I'm certainly interested.)
        Hide
        Lars George added a comment -

        Hi Tom,

        I am trying, but only have few cycles. So if someone else wants to take over, that is fine by me. If no one else wants to I will keep working on this as that seems better than nothing

        Lars

        Show
        Lars George added a comment - Hi Tom, I am trying, but only have few cycles. So if someone else wants to take over, that is fine by me. If no one else wants to I will keep working on this as that seems better than nothing Lars
        Hide
        Tom White added a comment -

        How's this progressing Lars? I've updated the fix version as it would be nice to get HBase support in the next release of Whirr.

        Show
        Tom White added a comment - How's this progressing Lars? I've updated the fix version as it would be nice to get HBase support in the next release of Whirr.
        Hide
        Lars George added a comment -

        I agree, I had the same thought when I wrote it down above. Will do.

        Show
        Lars George added a comment - I agree, I had the same thought when I wrote it down above. Will do.
        Hide
        Tom White added a comment -

        This sounds right. How about making the role names unique to HBase. E.g. by adding a "hb" prefix".

        Show
        Tom White added a comment - This sounds right. How about making the role names unique to HBase. E.g. by adding a "hb" prefix".
        Hide
        Lars George added a comment -

        One more thing, what about Thrift/REST/Avro servers that should be started at startup. Should we add a "role" for those too, like

        1 jt,nn,hm,zk,thrift,avro,rest ...
        

        or some such?

        Show
        Lars George added a comment - One more thing, what about Thrift/REST/Avro servers that should be started at startup. Should we add a "role" for those too, like 1 jt,nn,hm,zk,thrift,avro, rest ... or some such?
        Hide
        Andrew Purtell added a comment -

        For the former it seems that Andy has set up a bucket that holds the tarballs etc. Could you confirm Andy? What is in there?

        Whatever you want can go in. I have the tarballs for building with the old bash scripts there.

        Better to go with getting artifacts published into the whirr bucket.

        Is it assumed that the HBase service always spins up an embedded HDFS?

        I might want multiple HBase services on a shared HDFS service, using a shared ZooKeeper service ... and authenticating against a shared KDC, but with that last one I'm just piling on.

        Show
        Andrew Purtell added a comment - For the former it seems that Andy has set up a bucket that holds the tarballs etc. Could you confirm Andy? What is in there? Whatever you want can go in. I have the tarballs for building with the old bash scripts there. Better to go with getting artifacts published into the whirr bucket. Is it assumed that the HBase service always spins up an embedded HDFS? I might want multiple HBase services on a shared HDFS service, using a shared ZooKeeper service ... and authenticating against a shared KDC, but with that last one I'm just piling on.
        Hide
        Lars George added a comment -

        OK, makes sense. I would follow the steps you have in the other services, ie. install the various requirements as a step in the overall process. Well, really that would be just Java for now and that you have already covered.

        Show
        Lars George added a comment - OK, makes sense. I would follow the steps you have in the other services, ie. install the various requirements as a step in the overall process. Well, really that would be just Java for now and that you have already covered.
        Hide
        Tom White added a comment -

        Another question. The hbase-ec2 scripts had a helper that would create an image with java, hbase and so on installed that then could be used as a starting point. With Whirr it seems that the image is a public base image with Java on it. The init script then installs the hadoop packages (which is the same with the hadoop cloud scripts). What do suggest as best practices?

        I suggest that you have an initialization script that installs HBase on a base vanilla image. In the future it would be nice to be able to build images that have had the initialization script run; see WHIRR-88.

        On your other questions - looks good to me, but perhaps others can weigh in?

        Show
        Tom White added a comment - Another question. The hbase-ec2 scripts had a helper that would create an image with java, hbase and so on installed that then could be used as a starting point. With Whirr it seems that the image is a public base image with Java on it. The init script then installs the hadoop packages (which is the same with the hadoop cloud scripts). What do suggest as best practices? I suggest that you have an initialization script that installs HBase on a base vanilla image. In the future it would be nice to be able to build images that have had the initialization script run; see WHIRR-88 . On your other questions - looks good to me, but perhaps others can weigh in?
        Hide
        Lars George added a comment -

        And more questions. Is it assumed that the HBase service always spins up an embedded HDFS? Or is that optional and it may use the Hadoop service (while loosing locality guarantees I suppose)? For the former, we currently in hbase-ec2 simply specifiy "name", "slaves" and "zoos". This would need translating to the more role based syntax in Whirr. That could result into something like:

        1) 1 jt,nn,hm 10 tt,dn,rs
        2) 1 jt,nn 1 hm 10 tt,dn 10 rs

        and so on. Does that make sense? And if we add ZooKeeper as well to be self contained we get

        1) 3 zk 1 jt,nn,hm 10 tt,dn,rs
        2) 3 zk 1 jt,nn 1 hm 10 tt,dn 10 rs
        3) 1 jt,nn,hm,zk 5 tt,dn,rs

        Agreeable?

        Show
        Lars George added a comment - And more questions. Is it assumed that the HBase service always spins up an embedded HDFS? Or is that optional and it may use the Hadoop service (while loosing locality guarantees I suppose)? For the former, we currently in hbase-ec2 simply specifiy "name", "slaves" and "zoos". This would need translating to the more role based syntax in Whirr. That could result into something like: 1) 1 jt,nn,hm 10 tt,dn,rs 2) 1 jt,nn 1 hm 10 tt,dn 10 rs and so on. Does that make sense? And if we add ZooKeeper as well to be self contained we get 1) 3 zk 1 jt,nn,hm 10 tt,dn,rs 2) 3 zk 1 jt,nn 1 hm 10 tt,dn 10 rs 3) 1 jt,nn,hm,zk 5 tt,dn,rs Agreeable?
        Hide
        Lars George added a comment -

        Another question. The hbase-ec2 scripts had a helper that would create an image with java, hbase and so on installed that then could be used as a starting point. With Whirr it seems that the image is a public base image with Java on it. The init script then installs the hadoop packages (which is the same with the hadoop cloud scripts). What do suggest as best practices?

        Also there is still the question about ZooKeeper, hbase-ec2 spins it up and passes the details into the hbase instances. It looks though that there at least is no pseudo distributed or embedded ZK option but it spins up separate instances for ZK. That simplifies things and points towards making use of the ZK service already in Whirr. I still need to figure if this can be chained or if I can access the quorum details so they can be passed on into the hbase service. Ideally there is a forced option on the hbase service that tells it where to find the ZK service so that it can check if it is up and running (and optionally start it) and also get the quorum details to be passed on to the hbase init script.

        Does that make sense?

        Show
        Lars George added a comment - Another question. The hbase-ec2 scripts had a helper that would create an image with java, hbase and so on installed that then could be used as a starting point. With Whirr it seems that the image is a public base image with Java on it. The init script then installs the hadoop packages (which is the same with the hadoop cloud scripts). What do suggest as best practices? Also there is still the question about ZooKeeper, hbase-ec2 spins it up and passes the details into the hbase instances. It looks though that there at least is no pseudo distributed or embedded ZK option but it spins up separate instances for ZK. That simplifies things and points towards making use of the ZK service already in Whirr. I still need to figure if this can be chained or if I can access the quorum details so they can be passed on into the hbase service. Ideally there is a forced option on the hbase service that tells it where to find the ZK service so that it can check if it is up and running (and optionally start it) and also get the quorum details to be passed on to the hbase init script. Does that make sense?
        Hide
        Lars George added a comment -

        Thanks Tom, makes sense and the Whirr bucket part was just for reference. I was more after the HBase one and asking here for what is in it etc.

        I will ping Andy offline about it.

        Show
        Lars George added a comment - Thanks Tom, makes sense and the Whirr bucket part was just for reference. I was more after the HBase one and asking here for what is in it etc. I will ping Andy offline about it.
        Hide
        Tom White added a comment -

        Generally speaking, during development it's easiest to have your own S3 to hold scripts in. Then a committer can move the artifacts to the whirr bucket at commit time. Does this sound OK?

        Show
        Tom White added a comment - Generally speaking, during development it's easiest to have your own S3 to hold scripts in. Then a committer can move the artifacts to the whirr bucket at commit time. Does this sound OK?
        Hide
        Lars George added a comment -

        One of the things we need to address is how we set up HBase, i.e. Apache or CDH style. I am doing them in that order so the user has a choice. For the former it seems that Andy has set up a bucket that holds the tarballs etc. Could you confirm Andy? What is in there? Who is maintaining it? Should we use that here as well? I am still trying to figure how S3 public buckets work, is there a good place to get the gist? I saw that the Whirr one is actually browsable with s3cmd, but the HBase one is not.

        Show
        Lars George added a comment - One of the things we need to address is how we set up HBase, i.e. Apache or CDH style. I am doing them in that order so the user has a choice. For the former it seems that Andy has set up a bucket that holds the tarballs etc. Could you confirm Andy? What is in there? Who is maintaining it? Should we use that here as well? I am still trying to figure how S3 public buckets work, is there a good place to get the gist? I saw that the Whirr one is actually browsable with s3cmd, but the HBase one is not.
        Hide
        Tom White added a comment -

        Trunk is good.

        Show
        Tom White added a comment - Trunk is good.
        Hide
        Lars George added a comment -

        OK, I forked the project up on GitHub and started with ZooKeeper as a stencil. What version should I work against? I am currently on trunk but that may be wrong. We should update this issue to reflect the version?

        Show
        Lars George added a comment - OK, I forked the project up on GitHub and started with ZooKeeper as a stencil. What version should I work against? I am currently on trunk but that may be wrong. We should update this issue to reflect the version?
        Hide
        Tom White added a comment -

        Now with Whirr we have a service we can implement plus the run-urls, which are what the init-remote scripts used to be, but downloaded of a remote site as opposed to be compressed and piggybacked into the instance. Am I right here so far?

        Yes, exactly right. In the future it would be nice to have the ability to push a local copy of the scripts out to the cluster (WHIRR-99), but you don't need to worry about that in this issue.

        Or is the HBase service supposed to start ZooKeeper if the user wishes to do so? Also, Tom, is there a way to daisy chain services?

        It would be nice if HBase could use the ZK service to avoid duplication. Could the HBase service delegate to the ZK service to start and stop it? Patrick was talking about a more flexible model for starting services, so perhaps he's got some comments here too.

        The other thing I would say is separate the installation script from the configuration script, so in the future we can do an optimization where we build pre-installed images. We have this separation in ZK, but not currently in Hadoop, although WHIRR-87 will fix that.

        Thanks for looking at this Lars!

        Show
        Tom White added a comment - Now with Whirr we have a service we can implement plus the run-urls, which are what the init-remote scripts used to be, but downloaded of a remote site as opposed to be compressed and piggybacked into the instance. Am I right here so far? Yes, exactly right. In the future it would be nice to have the ability to push a local copy of the scripts out to the cluster ( WHIRR-99 ), but you don't need to worry about that in this issue. Or is the HBase service supposed to start ZooKeeper if the user wishes to do so? Also, Tom, is there a way to daisy chain services? It would be nice if HBase could use the ZK service to avoid duplication. Could the HBase service delegate to the ZK service to start and stop it? Patrick was talking about a more flexible model for starting services, so perhaps he's got some comments here too. The other thing I would say is separate the installation script from the configuration script, so in the future we can do an optimization where we build pre-installed images. We have this separation in ZK, but not currently in Hadoop, although WHIRR-87 will fix that. Thanks for looking at this Lars!
        Hide
        Lars George added a comment -

        I started looking into this after talking to Andy and Tom. I think this should be straight forward but would like to capture you opinions here. Previously we used start scripts (init remote scripts) that were adjusted to work for HBase. Now with Whirr we have a service we can implement plus the run-urls, which are what the init-remote scripts used to be, but downloaded of a remote site as opposed to be compressed and piggybacked into the instance. Am I right here so far?

        Andy, given you have implemented the previous hbase-ec2 scripts and seeing what you pointed out above, is there anything you like to add? Otherwise I would see what you have added to the hbase-ec2 scripts and port this over to Whirr.

        Also, there is already a ZooKeeper service for Whirr. Should that be used and then simply handed into the HBase service so it can find the ensemble? Or is the HBase service supposed to start ZooKeeper if the user wishes to do so? Also, Tom, is there a way to daisy chain services?

        I may be off here, still groking the details. Please feel free to point me into the right direction.

        Show
        Lars George added a comment - I started looking into this after talking to Andy and Tom. I think this should be straight forward but would like to capture you opinions here. Previously we used start scripts (init remote scripts) that were adjusted to work for HBase. Now with Whirr we have a service we can implement plus the run-urls, which are what the init-remote scripts used to be, but downloaded of a remote site as opposed to be compressed and piggybacked into the instance. Am I right here so far? Andy, given you have implemented the previous hbase-ec2 scripts and seeing what you pointed out above, is there anything you like to add? Otherwise I would see what you have added to the hbase-ec2 scripts and port this over to Whirr. Also, there is already a ZooKeeper service for Whirr. Should that be used and then simply handed into the HBase service so it can find the ensemble? Or is the HBase service supposed to start ZooKeeper if the user wishes to do so? Also, Tom, is there a way to daisy chain services? I may be off here, still groking the details. Please feel free to point me into the right direction.
        Hide
        Jeff Hammerbacher added a comment -

        Hey Andrew,

        I'm just poking around for now. Using Whirr to start HBase clusters would be a "nice to have" for a current project. There's some talk of moving these bash scripts to Chef, or at least providing Chef recipes as an alternative. Would that be a direction of interest to you?

        Later,
        Jeff

        Show
        Jeff Hammerbacher added a comment - Hey Andrew, I'm just poking around for now. Using Whirr to start HBase clusters would be a "nice to have" for a current project. There's some talk of moving these bash scripts to Chef, or at least providing Chef recipes as an alternative. Would that be a direction of interest to you? Later, Jeff
        Hide
        Andrew Purtell added a comment -

        Note substitutions are performed on hbase-ec2-init-remote.sh by the client after it learns the locations of the ZK quorum ensemble peers, and later the HDFS namenode and HBase master. Then the result is shipped to slaves using EC2's "user data" facility. The AMI pulls the user data and executes the script found there.

        I have it on my to do list to start investigating what facilities are available in Whirr already for supporting this, and plan to start next week.

        Show
        Andrew Purtell added a comment - Note substitutions are performed on hbase-ec2-init-remote.sh by the client after it learns the locations of the ZK quorum ensemble peers, and later the HDFS namenode and HBase master. Then the result is shipped to slaves using EC2's "user data" facility. The AMI pulls the user data and executes the script found there. I have it on my to do list to start investigating what facilities are available in Whirr already for supporting this, and plan to start next week.
        Hide
        Andrew Purtell added a comment -

        Jeff, are you working on this?

        The create-hbase-image-remote script unpacks tarballs for HBase and LZO into place. However hbase-ec2-init-remote.sh does a lot of config substitution, which are the important pieces. Tells the master and slaves where the ZK quorum is located. Sets runtime settings in site XML file and also the environment. Environment settings should be conditional on union of detected attributes of target (virtual) hardware and user preferences.

        Show
        Andrew Purtell added a comment - Jeff, are you working on this? The create-hbase-image-remote script unpacks tarballs for HBase and LZO into place. However hbase-ec2-init-remote.sh does a lot of config substitution, which are the important pieces. Tells the master and slaves where the ZK quorum is located. Sets runtime settings in site XML file and also the environment. Environment settings should be conditional on union of detected attributes of target (virtual) hardware and user preferences.
        Hide
        Jeff Hammerbacher added a comment -

        To create a script similar to scripts/apache/hadoop/install or scripts/apache/zookeeper/install for HBase, it looks like http://github.com/apurtell/hbase-ec2/blob/master/bin/image/create-hbase-image-remote has the important pieces.

        Show
        Jeff Hammerbacher added a comment - To create a script similar to scripts/apache/hadoop/install or scripts/apache/zookeeper/install for HBase, it looks like http://github.com/apurtell/hbase-ec2/blob/master/bin/image/create-hbase-image-remote has the important pieces.
        Hide
        Jeff Hammerbacher added a comment -

        The HBase scripts for EC2 currently live on Github at http://github.com/apurtell/hbase-ec2.

        Show
        Jeff Hammerbacher added a comment - The HBase scripts for EC2 currently live on Github at http://github.com/apurtell/hbase-ec2 .

          People

          • Assignee:
            Lars George
            Reporter:
            Tom White
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development