Bigtop
  1. Bigtop
  2. BIGTOP-1057

Add TeraGen / TeraSort Benchmakring

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.7.0
    • Component/s: tests
    • Labels:
      None

      Description

      Benchmarking is indeed on the road map for bigtop , as per a recent mailing list thread. Lets add in Terasort / Teragen as a starting point.

      Two to note when adding this module:

      First - This can be done

      1) either in a new test-artifacts submodule (i.e. called benchmarking), or else,

      2) it can be added individually into ecosystem components.

      Second - Parameterization of the tests from the pom file will make it so that the initial tests really are just smokes, and they can be changed over time to be real performance benchmarks.

      1. BIGTOP-1057.1.patch
        3 kB
        jay vyas
      2. BIGTOP-1057.1.patch
        3 kB
        jay vyas
      3. BIGTOP-1057.1.patch
        3 kB
        jay vyas

        Activity

        Hide
        jay vyas added a comment -

        Here are the terasort/teragen updates with some extra ( minor ) debugging information for newcomers (i.e. printing out the exact command syntax for failed tests )

        Show
        jay vyas added a comment - Here are the terasort/teragen updates with some extra ( minor ) debugging information for newcomers (i.e. printing out the exact command syntax for failed tests )
        Hide
        jay vyas added a comment -

        FYI pretty sure this patch is complete, but let me test it one more time on my cluster to make sure it works perfectly ... or in the meantime if anyone else wants to test and confirm it for me.

        Show
        jay vyas added a comment - FYI pretty sure this patch is complete, but let me test it one more time on my cluster to make sure it works perfectly ... or in the meantime if anyone else wants to test and confirm it for me.
        Hide
        jay vyas added a comment - - edited

        I've now tested and incorporated the final patch (16:59) and it works on my cluster. Im glad i tested it because there were some unnecessary changes (and a type error) in the first two. Must have been the wrong commit in the patch I created.

        Anyways, this one is ready for review ! Just let me know if any more changes are needed.

        Show
        jay vyas added a comment - - edited I've now tested and incorporated the final patch (16:59) and it works on my cluster. Im glad i tested it because there were some unnecessary changes (and a type error) in the first two. Must have been the wrong commit in the patch I created. Anyways, this one is ready for review ! Just let me know if any more changes are needed.
        Hide
        Roman Shaposhnik added a comment -

        jay vyas looks good to me. One small request for enhancement though: please make sure that the default values are small enough for the test to pass on clusters with only few Gb of HDFS.

        Show
        Roman Shaposhnik added a comment - jay vyas looks good to me. One small request for enhancement though: please make sure that the default values are small enough for the test to pass on clusters with only few Gb of HDFS.
        Hide
        jay vyas added a comment -

        Good point roman - yes its certainly okay for small clusters:

        teragen 10 = 10 rows of 100bytes -> 1000 bytes -> 1 KB
        teragen 1000 = 1000 rows of 100 bytes -> 100 KB

        And the default value is 1000, so we are in good shape

        $> ls -altrh data/teragen1000
        total 114K
        drwxr-xr-x. 39 root root 8.0K Sep 8 11:16 ..
        drwxr-xr-x. 3 root root 4.1K Sep 8 11:16 _logs
        -rwxrwxrwx. 1 root root 49K Sep 8 11:16 part-00000
        -rwxrwxrwx. 1 root root 49K Sep 8 11:16 part-00001
        -rwxrwxrwx. 1 root root 0 Sep 8 11:16 _SUCCESS
        drwxrwxrwx. 3 root root 4.1K Sep 8 11:16 .

        Show
        jay vyas added a comment - Good point roman - yes its certainly okay for small clusters: teragen 10 = 10 rows of 100bytes -> 1000 bytes -> 1 KB teragen 1000 = 1000 rows of 100 bytes -> 100 KB And the default value is 1000, so we are in good shape $> ls -altrh data/teragen1000 total 114K drwxr-xr-x. 39 root root 8.0K Sep 8 11:16 .. drwxr-xr-x. 3 root root 4.1K Sep 8 11:16 _logs -rwxrwxrwx. 1 root root 49K Sep 8 11:16 part-00000 -rwxrwxrwx. 1 root root 49K Sep 8 11:16 part-00001 -rwxrwxrwx. 1 root root 0 Sep 8 11:16 _SUCCESS drwxrwxrwx. 3 root root 4.1K Sep 8 11:16 .
        Hide
        Roman Shaposhnik added a comment -

        +1 and committed! Thanks for the patch!

        Show
        Roman Shaposhnik added a comment - +1 and committed! Thanks for the patch!

          People

          • Assignee:
            jay vyas
            Reporter:
            jay vyas
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development