|
Tom White made changes - 26/Nov/08 05:39 PM
Looks good Tom. A couple comments:
Argh, Jira wiki notation ate my code snippet.
sort_minutes=`expr ${sort_duration} / 60`
echo "YVALUE=${sort_minutes}" > sort_minutes.properties
Tom White made changes - 27/Nov/08 01:33 PM
Tom White made changes - 01/Dec/08 02:57 PM
Nigel Daley made changes - 01/Dec/08 05:12 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Launches a cluster on EC2
2. Waits for the cluster and Hadoop daemons to start
3. Runs a small sort job to warm up the cluster
4. Runs a sort job and emits the job duration
5. Terminates the cluster
Running on an 8 node cluster it took 2742 seconds to sort 32GB of data using the default hadoop-site.xml that the EC2 scripts use. This could be improved by using better settings.
There are several improvements that could be made to the script, in particular in detecting when the cluster is ready to go (the current script waits until 90% of the nodes are up then waits 1 minute for Hadoop to start). There are more ideas here: http://www.nabble.com/Auto-shutdown-for-EC2-clusters-td20132561.html
It would also be good to do multiple runs, discard the first and compute an average.
This should be a good basis for running a regular EC2 benchmark from Hudson.
Comments welcome.