Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6613

Change mapreduce.jobhistory.jhist.format default from json to binary

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Default of 'mapreduce.jobhistory.jhist.format' property changed from 'json' to 'binary'. Creates smaller, binary Avro .jhist files for faster JHS performance.

      Description

      MAPREDUCE-6376 added a configuration setting to set up .jhist internal format:

      mapreduce.jobhistory.jhist.format

      Currently, the default is "json". Changing the default to "binary" allows faster parsing, but with the downside of making the file not output friendly by using "hadoop fs cat".

        Issue Links

          Activity

          Hide
          aw Allen Wittenauer added a comment -

          What does "binary" actually mean? If it's protobuf, then it should really be "protobuf". If it's avro, then it should really be "avro".

          Show
          aw Allen Wittenauer added a comment - What does "binary" actually mean? If it's protobuf, then it should really be "protobuf". If it's avro, then it should really be "avro".
          Hide
          rchiang Ray Chiang added a comment -

          Initial version.

          Show
          rchiang Ray Chiang added a comment - Initial version.
          Hide
          rchiang Ray Chiang added a comment -

          It's always Avro. Your choices are Avro in json/text format or Avro in binary format.

          Show
          rchiang Ray Chiang added a comment - It's always Avro. Your choices are Avro in json/text format or Avro in binary format.
          Hide
          aw Allen Wittenauer added a comment -

          Ugh. That's even worse. Our inability to actually say what things are is really terrible.

          Show
          aw Allen Wittenauer added a comment - Ugh. That's even worse. Our inability to actually say what things are is really terrible.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          0 mvndep 0m 48s Maven dependency ordering for branch
          +1 mvninstall 10m 44s trunk passed
          +1 compile 2m 44s trunk passed with JDK v1.8.0_66
          +1 compile 2m 28s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 1m 19s trunk passed
          +1 mvneclipse 0m 37s trunk passed
          -1 findbugs 1m 37s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core in trunk has 2 extant Findbugs warnings.
          +1 javadoc 1m 12s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 12s trunk passed with JDK v1.7.0_91
          0 mvndep 0m 23s Maven dependency ordering for patch
          +1 mvninstall 1m 4s the patch passed
          +1 compile 2m 42s the patch passed with JDK v1.8.0_66
          +1 javac 2m 42s the patch passed
          +1 compile 2m 28s the patch passed with JDK v1.7.0_91
          +1 javac 2m 28s the patch passed
          +1 checkstyle 0m 26s the patch passed
          +1 mvnsite 1m 14s the patch passed
          +1 mvneclipse 0m 29s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 3m 17s the patch passed
          +1 javadoc 1m 9s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 7s the patch passed with JDK v1.7.0_91
          +1 unit 3m 1s hadoop-mapreduce-client-core in the patch passed with JDK v1.8.0_66.
          +1 unit 1m 3s hadoop-mapreduce-client-common in the patch passed with JDK v1.8.0_66.
          +1 unit 2m 56s hadoop-mapreduce-client-core in the patch passed with JDK v1.7.0_91.
          +1 unit 1m 1s hadoop-mapreduce-client-common in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 24s Patch does not generate ASF License warnings.
          48m 55s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783721/MAPREDUCE-6613.001.patch
          JIRA Issue MAPREDUCE-6613
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux e4591860ed53 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 99829eb
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          findbugs https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6278/artifact/patchprocess/branch-findbugs-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-warnings.html
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6278/testReport/
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common U: hadoop-mapreduce-project/hadoop-mapreduce-client
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6278/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. 0 mvndep 0m 48s Maven dependency ordering for branch +1 mvninstall 10m 44s trunk passed +1 compile 2m 44s trunk passed with JDK v1.8.0_66 +1 compile 2m 28s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 27s trunk passed +1 mvnsite 1m 19s trunk passed +1 mvneclipse 0m 37s trunk passed -1 findbugs 1m 37s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core in trunk has 2 extant Findbugs warnings. +1 javadoc 1m 12s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 12s trunk passed with JDK v1.7.0_91 0 mvndep 0m 23s Maven dependency ordering for patch +1 mvninstall 1m 4s the patch passed +1 compile 2m 42s the patch passed with JDK v1.8.0_66 +1 javac 2m 42s the patch passed +1 compile 2m 28s the patch passed with JDK v1.7.0_91 +1 javac 2m 28s the patch passed +1 checkstyle 0m 26s the patch passed +1 mvnsite 1m 14s the patch passed +1 mvneclipse 0m 29s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 17s the patch passed +1 javadoc 1m 9s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 7s the patch passed with JDK v1.7.0_91 +1 unit 3m 1s hadoop-mapreduce-client-core in the patch passed with JDK v1.8.0_66. +1 unit 1m 3s hadoop-mapreduce-client-common in the patch passed with JDK v1.8.0_66. +1 unit 2m 56s hadoop-mapreduce-client-core in the patch passed with JDK v1.7.0_91. +1 unit 1m 1s hadoop-mapreduce-client-common in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 24s Patch does not generate ASF License warnings. 48m 55s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783721/MAPREDUCE-6613.001.patch JIRA Issue MAPREDUCE-6613 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux e4591860ed53 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 99829eb Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6278/artifact/patchprocess/branch-findbugs-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-warnings.html JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6278/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common U: hadoop-mapreduce-project/hadoop-mapreduce-client Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6278/console This message was automatically generated.
          Hide
          rkanter Robert Kanter added a comment -

          This seems like a good default to change:

          • It has a ~2x speedup (see MAPREDUCE-6376)
          • The parsing code already can handle either format
          • I don't think we should worry about users catting the jhist file directly; that's not an official stable "API". If someone really wants to do that, they could set the config back to "json". Otherwise, defaulting to "binary" should help the most number of users.

          Jason Lowe, what do you think?

          Show
          rkanter Robert Kanter added a comment - This seems like a good default to change: It has a ~2x speedup (see MAPREDUCE-6376 ) The parsing code already can handle either format I don't think we should worry about users catting the jhist file directly; that's not an official stable "API". If someone really wants to do that, they could set the config back to "json". Otherwise, defaulting to "binary" should help the most number of users. Jason Lowe , what do you think?
          Hide
          aw Allen Wittenauer added a comment -

          I don't think we should worry about users catting the jhist file directly

          Umm, that's probably the #1 way people are doing post processing.

          Show
          aw Allen Wittenauer added a comment - I don't think we should worry about users catting the jhist file directly Umm, that's probably the #1 way people are doing post processing.
          Hide
          rchiang Ray Chiang added a comment -

          Allen Wittenauer, I had specifically put a target version of 3.0, since this is "under the covers" not backwards compatible. Even in that case, would you consider that unacceptable?

          Show
          rchiang Ray Chiang added a comment - Allen Wittenauer , I had specifically put a target version of 3.0, since this is "under the covers" not backwards compatible. Even in that case, would you consider that unacceptable?
          Hide
          aw Allen Wittenauer added a comment -

          Breaking it in 3.0 (with a release note, of course) is exactly the right thing to do.

          Show
          aw Allen Wittenauer added a comment - Breaking it in 3.0 (with a release note, of course) is exactly the right thing to do.
          Hide
          rkanter Robert Kanter added a comment -

          That seems reasonable. Ray Chiang, can you write something in the release notes box?

          Show
          rkanter Robert Kanter added a comment - That seems reasonable. Ray Chiang , can you write something in the release notes box?
          Hide
          rchiang Ray Chiang added a comment -

          Done.

          Show
          rchiang Ray Chiang added a comment - Done.
          Hide
          jlowe Jason Lowe added a comment -

          I have no issues with this going into 3.0. I agree with Allen that there are use cases today where people have built pipelines that consume the jhist files, so therefore it's risky to change the default in anything before 3.x.

          Show
          jlowe Jason Lowe added a comment - I have no issues with this going into 3.0. I agree with Allen that there are use cases today where people have built pipelines that consume the jhist files, so therefore it's risky to change the default in anything before 3.x.
          Hide
          rkanter Robert Kanter added a comment -

          +1 for 3.0.
          Will commit tomorrow if nobody objects.

          Show
          rkanter Robert Kanter added a comment - +1 for 3.0. Will commit tomorrow if nobody objects.
          Hide
          rkanter Robert Kanter added a comment -

          Thanks everyone. Committed to trunk!

          Show
          rkanter Robert Kanter added a comment - Thanks everyone. Committed to trunk!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9333 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9333/)
          MAPREDUCE-6613. Change mapreduce.jobhistory.jhist.format default from (rkanter: rev 6eae4337d1929077ffa74734327775fb987ba910)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JHAdminConfig.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9333 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9333/ ) MAPREDUCE-6613 . Change mapreduce.jobhistory.jhist.format default from (rkanter: rev 6eae4337d1929077ffa74734327775fb987ba910) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JHAdminConfig.java
          Hide
          rchiang Ray Chiang added a comment -

          Thanks for the feedback everyone.

          Show
          rchiang Ray Chiang added a comment - Thanks for the feedback everyone.

            People

            • Assignee:
              rchiang Ray Chiang
              Reporter:
              rchiang Ray Chiang
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development