Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-751

Rumen: a tool to extract job characterization data from job tracker logs


    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: tools/rumen
    • Labels:
    • Hadoop Flags:
    • Tags:
      rumen,mumakil,job tracker logs


      We propose a new map/reduce component, rumen, which can be used to process job history logs to produce any or all of the following:

      • Retrospective info describing the statistical behavior of the
        amount of time it would have taken to launch a job into a certain
        percentage of the number of mapper slots in the log's cluster, given the
        load over the period covered by the log
      • Statistical info as to the runtimes and shuffle times, etc. of
        the tasks and jobs covered by the log
      • files describing detailed job trace information, and the
        network topology as inferred from the host locations and rack IDs that
        arise in the job tracker log. In addition to this facility, rumen
        includes readers for this information to return job and detailed task
        information to other tools.

      These other tools include a more advanced version of gridmix, and also includes mumak: see blocked issues.

      1. mapreduce-751--2009-07-23.patch
        1012 kB
        Dick King
      2. 2009-08-19--1030.patch
        1008 kB
        Dick King
      3. 2009-08-26--1513-patch.patch
        1.44 MB
        Dick King
      4. mapreduce-751-20090826.patch
        1.44 MB
        Hong Tang
      5. mapreduce-751-20090826.patch
        1.44 MB
        Chris Douglas

        Issue Links



            • Assignee:
              Dick King
              Dick King
            • Votes:
              0 Vote for this issue
              18 Start watching this issue


              • Created: