Uploaded image for project: 'Apache Whirr (retired)'
  1. Apache Whirr (retired)
  2. WHIRR-63

Support EC2 Cluster Compute Groups for Hadoop

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.8.0
    • service/hadoop
    • None

    Description

      We should support the new EC2 cluster compute groups which have high bandwidth between nodes in the cluster. See http://docs.amazonwebservices.com/AWSEC2/latest/DeveloperGuide/index.html?using_cluster_computing.html

      Attachments

        1. WHIRR-63.patch
          6 kB
          Andrew Bayer

        Issue Links

          Activity

            savu.andrei Andrei Savu added a comment -

            Does jclouds support EC2 placement groups so that we can implement this one?

            savu.andrei Andrei Savu added a comment - Does jclouds support EC2 placement groups so that we can implement this one?

            if you choose templateBuilder.fastest() using the aws-ec2 provider, a cluster compute instance will be used and setup placement group automatically.

            if you want to specify placementgroup manually use AWSEC2TemplateOptions.placementGroup

            adrian@jclouds.org Adrian Cole (Inactive) added a comment - if you choose templateBuilder.fastest() using the aws-ec2 provider, a cluster compute instance will be used and setup placement group automatically. if you want to specify placementgroup manually use AWSEC2TemplateOptions.placementGroup
            savu.andrei Andrei Savu added a comment -

            Tom would it be enough if we expose the "fastest" template selection strategy as a configuration property?

            savu.andrei Andrei Savu added a comment - Tom would it be enough if we expose the "fastest" template selection strategy as a configuration property?
            tomwhite Thomas White added a comment -

            Yes, that would do it.

            tomwhite Thomas White added a comment - Yes, that would do it.
            epollan Evan Pollan added a comment - - edited

            cc1.4xlarge instance price was just halved. Even explicitly specifying us-east-1/ami-e4a7558d (the only non-RHEL image suggested by AWS for cluster compute instances – see WHIRR-148) gives the following error:

            Exception in thread "main" java.util.NoSuchElementException: imageId(us-east-1/ami-e4a7558d) not found
            	at org.jclouds.compute.domain.internal.TemplateBuilderImpl.build(TemplateBuilderImpl.java:567)
            	at org.apache.whirr.actions.BootstrapClusterAction.buildTemplate(BootstrapClusterAction.java:168)
            	at org.apache.whirr.actions.BootstrapClusterAction.doAction(BootstrapClusterAction.java:114)
            	at org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:80)
            	at org.apache.whirr.ClusterController.launchCluster(ClusterController.java:106)
            	at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:62)
            	at org.apache.whirr.cli.Main.run(Main.java:64)
            	at org.apache.whirr.cli.Main.main(Main.java:97)
            
            epollan Evan Pollan added a comment - - edited cc1.4xlarge instance price was just halved. Even explicitly specifying us-east-1/ami-e4a7558d (the only non-RHEL image suggested by AWS for cluster compute instances – see WHIRR-148 ) gives the following error: Exception in thread "main" java.util.NoSuchElementException: imageId(us-east-1/ami-e4a7558d) not found at org.jclouds.compute.domain.internal.TemplateBuilderImpl.build(TemplateBuilderImpl.java:567) at org.apache.whirr.actions.BootstrapClusterAction.buildTemplate(BootstrapClusterAction.java:168) at org.apache.whirr.actions.BootstrapClusterAction.doAction(BootstrapClusterAction.java:114) at org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:80) at org.apache.whirr.ClusterController.launchCluster(ClusterController.java:106) at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:62) at org.apache.whirr.cli.Main.run(Main.java:64) at org.apache.whirr.cli.Main.main(Main.java:97)
            savu.andrei Andrei Savu added a comment -

            An AMI that can be used for cluster compute instances in jclouds should match the following queries:

            See providers/aws-ec2/src/main/java/org/jclouds/aws/ec2/AWSEC2PropertiesBuilder.java

                  // amazon, alestic, canonical, and rightscale
                  properties.setProperty(PROPERTY_EC2_AMI_QUERY,
                           "owner-id=137112412989,063491364108,099720109477,411009282317;state=available;image-type=machine");
                  // amis that work with the cluster instances
                  properties.setProperty(PROPERTY_EC2_CC_REGIONS, Region.US_EAST_1);
                  properties
                           .setProperty(
                                    PROPERTY_EC2_CC_AMI_QUERY,
                                    "virtualization-type=hvm;architecture=x86_64;owner-id=137112412989,099720109477;hypervisor=xen;state=available;image-type=machine;root-device-type=ebs");
            

            I suggest you try with a 64bit AMI provided either by Amazon or Canonical.

            savu.andrei Andrei Savu added a comment - An AMI that can be used for cluster compute instances in jclouds should match the following queries: See providers/aws-ec2/src/main/java/org/jclouds/aws/ec2/AWSEC2PropertiesBuilder.java // amazon, alestic, canonical, and rightscale properties.setProperty(PROPERTY_EC2_AMI_QUERY, "owner-id=137112412989,063491364108,099720109477,411009282317;state=available;image-type=machine" ); // amis that work with the cluster instances properties.setProperty(PROPERTY_EC2_CC_REGIONS, Region.US_EAST_1); properties .setProperty( PROPERTY_EC2_CC_AMI_QUERY, "virtualization-type=hvm;architecture=x86_64;owner-id=137112412989,099720109477;hypervisor=xen;state=available;image-type=machine;root-device-type=ebs" ); I suggest you try with a 64bit AMI provided either by Amazon or Canonical.

            you can override these properties, but it looks like SLES comes with different pricing, so we shouldn't make this default, right?
            http://aws.amazon.com/suse/

            Right now, you can use ubuntu or amazon linux with cluster compute instance by default.

            adrian@jclouds.org Adrian Cole (Inactive) added a comment - you can override these properties, but it looks like SLES comes with different pricing, so we shouldn't make this default, right? http://aws.amazon.com/suse/ Right now, you can use ubuntu or amazon linux with cluster compute instance by default.
            epollan Evan Pollan added a comment -

            The only AMI returned by a cluster compute search was the one I tried above. I tried a search for HVM AMIs, and the only hit I got was an Amazon AMI (ami-0da96764).

            I got further, but still wasn't able to get a cluster up and running:

            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            _...lots of lines redacted..._
            Dying because - java.net.SocketTimeoutException: Read timed out
            Dying because - java.net.SocketTimeoutException: Read timed out
            <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            << (ec2-user@75.101.226.57:22) error acquiring SSHClient(ec2-user@75.101.226.57:22): Exhausted available authentication methods
            net.schmizz.sshj.userauth.UserAuthException: Exhausted available authentication methods
            	at net.schmizz.sshj.userauth.UserAuthImpl.authenticate(UserAuthImpl.java:114)
            	at net.schmizz.sshj.SSHClient.auth(SSHClient.java:204)
            	at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:304)
            	at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:323)
            	at org.jclouds.sshj.SshjSshClient$1.create(SshjSshClient.java:183)
            	at org.jclouds.sshj.SshjSshClient$1.create(SshjSshClient.java:155)
            	at org.jclouds.sshj.SshjSshClient.acquire(SshjSshClient.java:204)
            	at org.jclouds.sshj.SshjSshClient.connect(SshjSshClient.java:229)
            	at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:107)
            	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:69)
            	at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:44)
            	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
            	at java.lang.Thread.run(Thread.java:636)
            Caused by: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed
            	at net.schmizz.sshj.userauth.UserAuthImpl.handle(UserAuthImpl.java:157)
            	at net.schmizz.sshj.transport.TransportImpl.handle(TransportImpl.java:474)
            	at net.schmizz.sshj.transport.Decoder.decode(Decoder.java:127)
            	at net.schmizz.sshj.transport.Decoder.received(Decoder.java:195)
            	at net.schmizz.sshj.transport.Reader.run(Reader.java:72)
            << problem applying options to node(us-east-1/i-3bceb758): 
            

            Is there a Hadoop-friendly AMI that's known to work on cluster compute nodes?

            epollan Evan Pollan added a comment - The only AMI returned by a cluster compute search was the one I tried above. I tried a search for HVM AMIs, and the only hit I got was an Amazon AMI (ami-0da96764). I got further, but still wasn't able to get a cluster up and running: <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out _...lots of lines redacted..._ Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed << (ec2-user@75.101.226.57:22) error acquiring SSHClient(ec2-user@75.101.226.57:22): Exhausted available authentication methods net.schmizz.sshj.userauth.UserAuthException: Exhausted available authentication methods at net.schmizz.sshj.userauth.UserAuthImpl.authenticate(UserAuthImpl.java:114) at net.schmizz.sshj.SSHClient.auth(SSHClient.java:204) at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:304) at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:323) at org.jclouds.sshj.SshjSshClient$1.create(SshjSshClient.java:183) at org.jclouds.sshj.SshjSshClient$1.create(SshjSshClient.java:155) at org.jclouds.sshj.SshjSshClient.acquire(SshjSshClient.java:204) at org.jclouds.sshj.SshjSshClient.connect(SshjSshClient.java:229) at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:107) at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:69) at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:44) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang. Thread .run( Thread .java:636) Caused by: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed at net.schmizz.sshj.userauth.UserAuthImpl.handle(UserAuthImpl.java:157) at net.schmizz.sshj.transport.TransportImpl.handle(TransportImpl.java:474) at net.schmizz.sshj.transport.Decoder.decode(Decoder.java:127) at net.schmizz.sshj.transport.Decoder.received(Decoder.java:195) at net.schmizz.sshj.transport.Reader.run(Reader.java:72) << problem applying options to node(us-east-1/i-3bceb758): Is there a Hadoop-friendly AMI that's known to work on cluster compute nodes?
            zzhang Dafu added a comment -

            As Adrian suggested
            "
            if you choose templateBuilder.fastest() using the aws-ec2 provider, a cluster compute instance will be used and setup placement group automatically.

            if you want to specify placementgroup manually use AWSEC2TemplateOptions.placementGroup
            "

            'fastest' function could be a nice thing to expose, but it is not the solution to this problem. I think we have to go with calling placementGroup explicitly, but placementGroup is not exposed in the TemplateOptions interface. Any suggestions?

            zzhang Dafu added a comment - As Adrian suggested " if you choose templateBuilder.fastest() using the aws-ec2 provider, a cluster compute instance will be used and setup placement group automatically. if you want to specify placementgroup manually use AWSEC2TemplateOptions.placementGroup " 'fastest' function could be a nice thing to expose, but it is not the solution to this problem. I think we have to go with calling placementGroup explicitly, but placementGroup is not exposed in the TemplateOptions interface. Any suggestions?

            per note above AWSEC2TemplateOptions is what you want. it is a subclass of TemplateOptions

            adrian@jclouds.org Adrian Cole (Inactive) added a comment - per note above AWSEC2TemplateOptions is what you want. it is a subclass of TemplateOptions
            abayer Andrew Bayer added a comment -

            So, making sure I understand this, we'd create a placement group first and then add that placement group to the TemplateOptions, akin to how we do spot pricing currently?

            abayer Andrew Bayer added a comment - So, making sure I understand this, we'd create a placement group first and then add that placement group to the TemplateOptions, akin to how we do spot pricing currently?
            abayer Andrew Bayer added a comment -

            Hrm - looks like I'm hitting a similar problem to one Sean reported on the mailing list. I can't get any HVM AMIs to actually start via jclouds. I'll keep digging, but this is, unsurprisingly, annoying.

            abayer Andrew Bayer added a comment - Hrm - looks like I'm hitting a similar problem to one Sean reported on the mailing list. I can't get any HVM AMIs to actually start via jclouds. I'll keep digging, but this is, unsurprisingly, annoying.
            zzhang Dafu added a comment -

            Placement group is the first step. Then, you need to make sure you claim the storage that you are entitled to. When you raise the instances, you need to, if you use the API, specify the -b option. see http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/InstanceStorage.html

            I hope that I can help you to avoid some of the pitfalls. I ended up doing most of the stuff manually. I only did my testing once, so not too bad. Good luck.

            zzhang Dafu added a comment - Placement group is the first step. Then, you need to make sure you claim the storage that you are entitled to. When you raise the instances, you need to, if you use the API, specify the -b option. see http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/InstanceStorage.html I hope that I can help you to avoid some of the pitfalls. I ended up doing most of the stuff manually. I only did my testing once, so not too bad. Good luck.
            abayer Andrew Bayer added a comment -

            Yeah, I've verified - it's not possible to create at least a Linux HVM instance at this point - see http://code.google.com/p/jclouds/issues/detail?id=1024

            abayer Andrew Bayer added a comment - Yeah, I've verified - it's not possible to create at least a Linux HVM instance at this point - see http://code.google.com/p/jclouds/issues/detail?id=1024
            abayer Andrew Bayer added a comment -

            I've got a possible fix on the jclouds side, but that'll require moving to jclouds 1.5, which means WHIRR-593 needs to be put to bed first.

            abayer Andrew Bayer added a comment - I've got a possible fix on the jclouds side, but that'll require moving to jclouds 1.5, which means WHIRR-593 needs to be put to bed first.
            abayer Andrew Bayer added a comment -

            This is a preliminary patch - it depends on Whirr going to jclouds 1.5, and jclouds 1.5 itself fixing http://code.google.com/p/jclouds/issues/detail?id=1024 (which may be done shortly anyway), but with those in place, it works. If you specify whirr.hardware-id=cc1.4xlarge and a valid HVM AMI (which admittedly is a bit hairy at the moment - there's a Precise HVM AMI, but no Lucid HVM AMI), it'll automatically create a placement group and put all the nodes from the created cluster in it. And if you specify whirr.aws-ec2-placement-group=foo, it'll look for an existing placement group named "foo" and put the nodes in that.

            abayer Andrew Bayer added a comment - This is a preliminary patch - it depends on Whirr going to jclouds 1.5, and jclouds 1.5 itself fixing http://code.google.com/p/jclouds/issues/detail?id=1024 (which may be done shortly anyway), but with those in place, it works. If you specify whirr.hardware-id=cc1.4xlarge and a valid HVM AMI (which admittedly is a bit hairy at the moment - there's a Precise HVM AMI, but no Lucid HVM AMI), it'll automatically create a placement group and put all the nodes from the created cluster in it. And if you specify whirr.aws-ec2-placement-group=foo, it'll look for an existing placement group named "foo" and put the nodes in that.
            savu.andrei Andrei Savu added a comment -

            Looks good to me. +1 to commit as soon as we upgrade to jclouds 1.5.0-beta.7

            savu.andrei Andrei Savu added a comment - Looks good to me. +1 to commit as soon as we upgrade to jclouds 1.5.0-beta.7
            abayer Andrew Bayer added a comment -

            Committed.

            abayer Andrew Bayer added a comment - Committed.

            People

              abayer Andrew Bayer
              tomwhite Thomas White
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: