Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-534

Provide accounting functionality for Hadoop resource manager

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      HADOOP-3421 describes requirements for a new resource manager in Hadoop to schedule Map/Reduce jobs. In production systems, it would be useful to produce accounting information related to the scheduling - such as job start and run times, resources used, etc. This information can be consumed by other systems to build accounting for shared resources. This JIRA is for tracking the requirements, approach and implementation for producing accounting information.

      1. history-scripts-0.18.tgz
        9 kB
        Matei Zaharia
      2. history-scripts-0.16.tgz
        9 kB
        Matei Zaharia
      3. history-scripts.patch
        32 kB
        Matei Zaharia

        Issue Links

          Activity

          Hemanth Yamijala created issue -
          Hemanth Yamijala made changes -
          Field Original Value New Value
          Link This issue is part of HADOOP-3444 [ HADOOP-3444 ]
          Hide
          Hemanth Yamijala added a comment -

          This won't make 0.19. Moving out.

          Show
          Hemanth Yamijala added a comment - This won't make 0.19. Moving out.
          Hemanth Yamijala made changes -
          Fix Version/s 0.19.0 [ 12313211 ]
          Hide
          Matei Zaharia added a comment -

          A lot of this data is available in the job history logs written by the job tracker. At Facebook, we have a set of scripts that parse these logs and put the data in a MySQL database, where it can be queried. Would you be interested in something like that? Or do you want something more real-time, integrated into the Hadoop web UI?

          Show
          Matei Zaharia added a comment - A lot of this data is available in the job history logs written by the job tracker. At Facebook, we have a set of scripts that parse these logs and put the data in a MySQL database, where it can be queried. Would you be interested in something like that? Or do you want something more real-time, integrated into the Hadoop web UI?
          Hide
          Hemanth Yamijala added a comment -

          Matei, the system we are currently using with HOD does something similar to what you're doing at Facebook. So, yes, we would definitely be interested to see what you have. Can you please provide this as a patch or something ?

          Show
          Hemanth Yamijala added a comment - Matei, the system we are currently using with HOD does something similar to what you're doing at Facebook. So, yes, we would definitely be interested to see what you have. Can you please provide this as a patch or something ?
          Hemanth Yamijala made changes -
          Fix Version/s 0.20.0 [ 12313438 ]
          Hemanth Yamijala made changes -
          Component/s mapred [ 12310690 ]
          Component/s contrib/capacity-sched [ 12312466 ]
          Hide
          Matei Zaharia added a comment -

          I've attached a patch with the scripts I built at Facebook. There is also a readme with an explanation of what they do. Basically there are two scripts: One that just parses the history logs and jobconfs and puts the exact same data into MySQL, creating tables of jobs, jobconf XML key-value pairs, tasks and task attempts, and then a second script that performs some joins on these tables to create a set of job summary reports with ~1 KB of data per job that can be used to run queries quickly for the purposes of visualization. You can build a visualization based on this database using your favorite tool. Unfortunately I can't open source the one used at Facebook because it depends on a lot of internal web libraries.

          Show
          Matei Zaharia added a comment - I've attached a patch with the scripts I built at Facebook. There is also a readme with an explanation of what they do. Basically there are two scripts: One that just parses the history logs and jobconfs and puts the exact same data into MySQL, creating tables of jobs, jobconf XML key-value pairs, tasks and task attempts, and then a second script that performs some joins on these tables to create a set of job summary reports with ~1 KB of data per job that can be used to run queries quickly for the purposes of visualization. You can build a visualization based on this database using your favorite tool. Unfortunately I can't open source the one used at Facebook because it depends on a lot of internal web libraries.
          Matei Zaharia made changes -
          Attachment history-scripts.patch [ 12392682 ]
          Hide
          Matei Zaharia added a comment -

          Here is a set of scripts for Hadoop 0.16, for those interested. I am also currently investigating what needs to be done to get the scripts working on 0.18.

          Show
          Matei Zaharia added a comment - Here is a set of scripts for Hadoop 0.16, for those interested. I am also currently investigating what needs to be done to get the scripts working on 0.18.
          Matei Zaharia made changes -
          Attachment history-scripts-0.16.tgz [ 12392894 ]
          Hide
          Matei Zaharia added a comment -

          Here are some modified scripts for 0.18 too. It looks like some naming in the history log changed slightly, unfortunately (e.g. task attempts being called task_someting instead of tip_something, or counters being written as key:value instead of key=value). Are you guys planning to finalize the history log format sometime? It would make it easier to do accounting after the fact using tools like this.

          Show
          Matei Zaharia added a comment - Here are some modified scripts for 0.18 too. It looks like some naming in the history log changed slightly, unfortunately (e.g. task attempts being called task_someting instead of tip_something, or counters being written as key:value instead of key=value). Are you guys planning to finalize the history log format sometime? It would make it easier to do accounting after the fact using tools like this.
          Matei Zaharia made changes -
          Attachment history-scripts-0.18.tgz [ 12392896 ]
          Nigel Daley made changes -
          Fix Version/s 0.20.0 [ 12313438 ]
          Owen O'Malley made changes -
          Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
          Key HADOOP-3708 MAPREDUCE-534
          Component/s contrib/capacity-sched [ 12312466 ]
          Hide
          Allen Wittenauer added a comment -

          The basics of this is in place and other projects such as LinkedIn's White Elephant do more. Closing this.

          Show
          Allen Wittenauer added a comment - The basics of this is in place and other projects such as LinkedIn's White Elephant do more. Closing this.
          Allen Wittenauer made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          2201d 23h 53m 1 Allen Wittenauer 18/Jul/14 18:46

            People

            • Assignee:
              Hemanth Yamijala
              Reporter:
              Hemanth Yamijala
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development