Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-778

[Rumen] Need a standalone JobHistory log anonymizer

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.23.0, 2.0.0-alpha
    • 0.23.1
    • tools/rumen
    • Reviewed
    • Added an anonymizer tool to Rumen. Anonymizer takes a Rumen trace file and/or topology as input. It supports persistence and plugins to override the default behavior.
    • rumen anonymization

    Description

      Job history logs contain a rich set of information that can help understand and characterize cluster workload and individual job execution. Examples of work that parses or utilizes job history include HADOOP-3585, MAPREDUCE-534, HDFS-459, MAPREDUCE-728, and MAPREDUCE-776. Some of the parsing tools developed in previous work already contains a component to anonymize the logs. It would be nice to combine these effort and have a common standalone tool that can anonymizes job history logs and preserve much of the structure of the files so that existing tools on top of job history logs continue work with no modification.

      Attachments

        1. anonymizer.py
          10 kB
          Guanying Wang
        2. same.py
          2 kB
          Guanying Wang
        3. ASF.LICENSE.NOT.GRANTED--anonymizer.patch
          20 kB
          Guanying Wang
        4. mapreduce-778-v1.2-2.patch
          91 kB
          Amar Kamat
        5. mapreduce-778-v1.14-12.patch
          220 kB
          Amar Kamat
        6. mapreduce-778-v1.14-14.patch
          220 kB
          Amar Kamat
        7. MAPREDUCE-778_branch0.23.patch
          220 kB
          Alejandro Abdelnur

        Issue Links

          Activity

            People

              amar_kamat Amar Kamat
              hong.tang Hong Tang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: