Whirr
  1. Whirr
  2. WHIRR-92

Add a benchmark for Hadoop clusters

    Details

    • Type: Test Test
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.3.0
    • Component/s: service/hadoop
    • Labels:
      None

      Description

      To help tune cluster configurations on various providers we should run and record the results of a benchmark (e.g. Terasort) on Hadoop clusters.

      1. WHIRR-92.patch
        24 kB
        Tom White
      2. WHIRR-92.patch
        23 kB
        Tom White
      3. WHIRR-92.patch
        23 kB
        Tom White

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          I've just committed this. I also added some documentation to https://cwiki.apache.org/confluence/display/WHIRR/Running+Benchmarks.

          Show
          Tom White added a comment - I've just committed this. I also added some documentation to https://cwiki.apache.org/confluence/display/WHIRR/Running+Benchmarks .
          Hide
          Tom White added a comment -

          Renamed BenchmarkSuite to HadoopBenchmarkSuite.

          I successfully ran the benchmarks with m1.large instances on EC2:

          whirr.hardware-id=m1.large
          # Ubuntu 10.04 LTS Lucid instance-store - see http://alestic.com/
          whirr.image-id=us-east-1/ami-da0cf8b3
          whirr.location-id=us-east-1
          
          Show
          Tom White added a comment - Renamed BenchmarkSuite to HadoopBenchmarkSuite. I successfully ran the benchmarks with m1.large instances on EC2: whirr.hardware-id=m1.large # Ubuntu 10.04 LTS Lucid instance-store - see http://alestic.com/ whirr.image-id=us-east-1/ami-da0cf8b3 whirr.location-id=us-east-1
          Hide
          Tom White added a comment -

          Updated for WHIRR-87.

          Show
          Tom White added a comment - Updated for WHIRR-87 .
          Hide
          Tom White added a comment -

          Here's a patch for running TeraSort and TestDFSIO (and test of HDFS throughput) benchmarks. There's more work to run this against clusters and tune settings, but this is ready for inclusion, I think.

          The version of TestDFSIO needed is the latest from the 0.20 Hadoop branch (since it implements the Tool interface), which can be built with ant jar-test and copied to services/hadoop/lib.

          To run the whole benchmark suite:

          mvn verify -Pintegration -DargLine="-Dwhirr.test.identity=... -Dwhirr.test.credential=... -Dconfig=.whirr-test.properties" -Dit.test=BenchmarkSuite
          

          To run a particular test:

          mvn verify -Pintegration -DargLine="-Dwhirr.test.identity=... -Dwhirr.test.credential=... -Dconfig=.whirr-test.properties" -Dit.test=HadoopServiceTestDFSIOBenchmark
          

          By default no benchmarks are run when you run integration tests.

          Show
          Tom White added a comment - Here's a patch for running TeraSort and TestDFSIO (and test of HDFS throughput) benchmarks. There's more work to run this against clusters and tune settings, but this is ready for inclusion, I think. The version of TestDFSIO needed is the latest from the 0.20 Hadoop branch (since it implements the Tool interface), which can be built with ant jar-test and copied to services/hadoop/lib. To run the whole benchmark suite: mvn verify -Pintegration -DargLine="-Dwhirr.test.identity=... -Dwhirr.test.credential=... -Dconfig=.whirr-test.properties" -Dit.test=BenchmarkSuite To run a particular test: mvn verify -Pintegration -DargLine="-Dwhirr.test.identity=... -Dwhirr.test.credential=... -Dconfig=.whirr-test.properties" -Dit.test=HadoopServiceTestDFSIOBenchmark By default no benchmarks are run when you run integration tests.
          Hide
          Patrick Hunt added a comment -

          Not a blocker for 0.2.0, pushing to 0.3.0.

          Show
          Patrick Hunt added a comment - Not a blocker for 0.2.0, pushing to 0.3.0.
          Hide
          Tom White added a comment -

          Interesting benchmark comparing Cassandra on bare metal vs. EC2: http://www.coreyhulen.org/?p=326

          Show
          Tom White added a comment - Interesting benchmark comparing Cassandra on bare metal vs. EC2: http://www.coreyhulen.org/?p=326
          Hide
          Tom White added a comment -

          We should enable intermediate map-output compression using LZO.

          Show
          Tom White added a comment - We should enable intermediate map-output compression using LZO.

            People

            • Assignee:
              Tom White
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development