Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
We should support the new EC2 cluster compute groups which have high bandwidth between nodes in the cluster. See http://docs.amazonwebservices.com/AWSEC2/latest/DeveloperGuide/index.html?using_cluster_computing.html
Attachments
Attachments
- WHIRR-63.patch
- 6 kB
- Andrew Bayer
Issue Links
Activity
if you choose templateBuilder.fastest() using the aws-ec2 provider, a cluster compute instance will be used and setup placement group automatically.
if you want to specify placementgroup manually use AWSEC2TemplateOptions.placementGroup
Tom would it be enough if we expose the "fastest" template selection strategy as a configuration property?
cc1.4xlarge instance price was just halved. Even explicitly specifying us-east-1/ami-e4a7558d (the only non-RHEL image suggested by AWS for cluster compute instances – see WHIRR-148) gives the following error:
Exception in thread "main" java.util.NoSuchElementException: imageId(us-east-1/ami-e4a7558d) not found
at org.jclouds.compute.domain.internal.TemplateBuilderImpl.build(TemplateBuilderImpl.java:567)
at org.apache.whirr.actions.BootstrapClusterAction.buildTemplate(BootstrapClusterAction.java:168)
at org.apache.whirr.actions.BootstrapClusterAction.doAction(BootstrapClusterAction.java:114)
at org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:80)
at org.apache.whirr.ClusterController.launchCluster(ClusterController.java:106)
at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:62)
at org.apache.whirr.cli.Main.run(Main.java:64)
at org.apache.whirr.cli.Main.main(Main.java:97)
An AMI that can be used for cluster compute instances in jclouds should match the following queries:
See providers/aws-ec2/src/main/java/org/jclouds/aws/ec2/AWSEC2PropertiesBuilder.java
// amazon, alestic, canonical, and rightscale properties.setProperty(PROPERTY_EC2_AMI_QUERY, "owner-id=137112412989,063491364108,099720109477,411009282317;state=available;image-type=machine"); // amis that work with the cluster instances properties.setProperty(PROPERTY_EC2_CC_REGIONS, Region.US_EAST_1); properties .setProperty( PROPERTY_EC2_CC_AMI_QUERY, "virtualization-type=hvm;architecture=x86_64;owner-id=137112412989,099720109477;hypervisor=xen;state=available;image-type=machine;root-device-type=ebs");
I suggest you try with a 64bit AMI provided either by Amazon or Canonical.
you can override these properties, but it looks like SLES comes with different pricing, so we shouldn't make this default, right?
http://aws.amazon.com/suse/
Right now, you can use ubuntu or amazon linux with cluster compute instance by default.
The only AMI returned by a cluster compute search was the one I tried above. I tried a search for HVM AMIs, and the only hit I got was an Amazon AMI (ami-0da96764).
I got further, but still wasn't able to get a cluster up and running:
<<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out _...lots of lines redacted..._ Dying because - java.net.SocketTimeoutException: Read timed out Dying because - java.net.SocketTimeoutException: Read timed out <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed << (ec2-user@75.101.226.57:22) error acquiring SSHClient(ec2-user@75.101.226.57:22): Exhausted available authentication methods net.schmizz.sshj.userauth.UserAuthException: Exhausted available authentication methods at net.schmizz.sshj.userauth.UserAuthImpl.authenticate(UserAuthImpl.java:114) at net.schmizz.sshj.SSHClient.auth(SSHClient.java:204) at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:304) at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:323) at org.jclouds.sshj.SshjSshClient$1.create(SshjSshClient.java:183) at org.jclouds.sshj.SshjSshClient$1.create(SshjSshClient.java:155) at org.jclouds.sshj.SshjSshClient.acquire(SshjSshClient.java:204) at org.jclouds.sshj.SshjSshClient.connect(SshjSshClient.java:229) at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:107) at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:69) at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:44) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: net.schmizz.sshj.userauth.UserAuthException: publickey auth failed at net.schmizz.sshj.userauth.UserAuthImpl.handle(UserAuthImpl.java:157) at net.schmizz.sshj.transport.TransportImpl.handle(TransportImpl.java:474) at net.schmizz.sshj.transport.Decoder.decode(Decoder.java:127) at net.schmizz.sshj.transport.Decoder.received(Decoder.java:195) at net.schmizz.sshj.transport.Reader.run(Reader.java:72) << problem applying options to node(us-east-1/i-3bceb758):
Is there a Hadoop-friendly AMI that's known to work on cluster compute nodes?
As Adrian suggested
"
if you choose templateBuilder.fastest() using the aws-ec2 provider, a cluster compute instance will be used and setup placement group automatically.
if you want to specify placementgroup manually use AWSEC2TemplateOptions.placementGroup
"
'fastest' function could be a nice thing to expose, but it is not the solution to this problem. I think we have to go with calling placementGroup explicitly, but placementGroup is not exposed in the TemplateOptions interface. Any suggestions?
per note above AWSEC2TemplateOptions is what you want. it is a subclass of TemplateOptions
So, making sure I understand this, we'd create a placement group first and then add that placement group to the TemplateOptions, akin to how we do spot pricing currently?
Hrm - looks like I'm hitting a similar problem to one Sean reported on the mailing list. I can't get any HVM AMIs to actually start via jclouds. I'll keep digging, but this is, unsurprisingly, annoying.
Placement group is the first step. Then, you need to make sure you claim the storage that you are entitled to. When you raise the instances, you need to, if you use the API, specify the -b option. see http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/InstanceStorage.html
I hope that I can help you to avoid some of the pitfalls. I ended up doing most of the stuff manually. I only did my testing once, so not too bad. Good luck.
Yeah, I've verified - it's not possible to create at least a Linux HVM instance at this point - see http://code.google.com/p/jclouds/issues/detail?id=1024
I've got a possible fix on the jclouds side, but that'll require moving to jclouds 1.5, which means WHIRR-593 needs to be put to bed first.
This is a preliminary patch - it depends on Whirr going to jclouds 1.5, and jclouds 1.5 itself fixing http://code.google.com/p/jclouds/issues/detail?id=1024 (which may be done shortly anyway), but with those in place, it works. If you specify whirr.hardware-id=cc1.4xlarge and a valid HVM AMI (which admittedly is a bit hairy at the moment - there's a Precise HVM AMI, but no Lucid HVM AMI), it'll automatically create a placement group and put all the nodes from the created cluster in it. And if you specify whirr.aws-ec2-placement-group=foo, it'll look for an existing placement group named "foo" and put the nodes in that.
Looks good to me. +1 to commit as soon as we upgrade to jclouds 1.5.0-beta.7
Does jclouds support EC2 placement groups so that we can implement this one?