|
[
Permlink
| « Hide
]
Tom White added a comment - 12/Dec/07 03:37 PM
Another motivation for this change is to make it more straightforward to add nodes on the fly to an existing cluster. The only change needed would be an extra parameter in the launch script to indicate whether to start a master node - if set to true instance 0 would start as the master (as it does at the moment), otherwise all the new instances would connect to an existing master.
Amazon has just released a new feature, EC2 Availability Zones (http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1347&categoryID=112
Also, it is now possible to use the public IP address of EC2 nodes from within the EC2 cluster (contrary to the comment in the description above). However, this will incur data transfer costs, which can be avoided by using the private IP address. See http://docs.amazonwebservices.com/AWSEC2/2008-02-01/DeveloperGuide/instance-addressing.html Here is a first cut at supporting multiple concurrent clusters, the image instance sizes, zone availability, and ganglia.
This patch represents a fair number of changes and will need accompanying documentation.
The typical usecase is this: > hadoop-ec2 launch-cluster my-group 5 In another window (after launch-cluster), this is quite useful, and works will with FoxyProxy: There are still some rough edges I think. > hadoop-ec2 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12378761/concurrent-clusters.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2084/console This message is automatically generated. submitted with correct paths
removed DFS_WRITE_RETRIES as it's not really necessary with the new kernels.
a tar of all relevant files.
this version checks both current releases and archives for the hadoop distro
I've just committed this. Thanks Chris!
I tried out the new scripts and they worked fine. I changed the version of Hadoop in the env file to be 0.17.0 so that it picks up the new AMI when it is created (after 0.17.0 is released). I also changed the version of Java to 1.6.0_05. Chris, could you update the documentation on the wiki page with the changes please? It would be worth keeping the instructions for the older scripts around on the same page. Integrated in Hadoop-trunk #451 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/451/
One change I've had to made is to add the memory option for the processes the cluster launches in the hadoop-site.xml that gets generated.. this would probably be a good thing to make configurable for the end user.
I also have manually installed nph-proxy on the master node, with http authentication – makes it much easier to get around the slave nodes. Good idea re configurable memory for the trackers and datanode services, though I find the defaults fine (so far). But I tend to pass my child vm option in per job since they vary. Still a good idea to provide the option.
Note that: hadoop-ec2 proxy <cluster-name> starts a local SOCKS tunnel. Used with FoxyProxy FF plugin, you can browse your cluster. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||