
|
If you were logged in you would be able to see more operations.
|
|
|
|
File Attachments:
|
|
|
Issue Links:
|
Reference
|
|
This issue relates to:
|
|
|
HADOOP-4745 EC2 scripts should configure Hadoop to use all available disks on large instances
|
|
|
|
|
|
|
|
|
By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.
|
|
Description
|
By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases. |
Show » |
|
1. Launches a cluster on EC2
2. Waits for the cluster and Hadoop daemons to start
3. Runs a small sort job to warm up the cluster
4. Runs a sort job and emits the job duration
5. Terminates the cluster
Running on an 8 node cluster it took 2742 seconds to sort 32GB of data using the default hadoop-site.xml that the EC2 scripts use. This could be improved by using better settings.
There are several improvements that could be made to the script, in particular in detecting when the cluster is ready to go (the current script waits until 90% of the nodes are up then waits 1 minute for Hadoop to start). There are more ideas here: http://www.nabble.com/Auto-shutdown-for-EC2-clusters-td20132561.html
It would also be good to do multiple runs, discard the first and compute an average.
This should be a good basis for running a regular EC2 benchmark from Hudson.
Comments welcome.