Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6793

io.sort.factor code default and mapred-default.xml values inconsistent

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 2.6.5, 3.0.0-alpha1
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: task
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      The actual default value in mapred-default.xml:

      <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>10</value>
        <description>The number of streams to merge at once while sorting
        files.  This determines the number of open file handles.</description>
      </property>
      

      However, MapTask and MergeManagerImpl, are coded with:

             
       int mergeFactor = job.getInt(JobContext.IO_SORT_FACTOR, 100);
      
      1. MAPREDUCE-6793.003.patch
        5 kB
        Gera Shegalov
      2. 0002-MAPREDUCE-6793.patch
        4 kB
        Prabhu Joseph
      3. 0001-MAPREDUCE-6793.patch
        1 kB
        Prabhu Joseph

        Activity

        Hide
        jira.shegalov Gera Shegalov added a comment -

        Thanks everyone!

        Show
        jira.shegalov Gera Shegalov added a comment - Thanks everyone!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10869 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10869/)
        MAPREDUCE-6793. io.sort.factor code default and mapred-default.xml (rohithsharmaks: rev 683e0c71fe09600a24bdd7b707a613fe70ff1f6e)

        • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
        • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
        • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
        • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10869 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10869/ ) MAPREDUCE-6793 . io.sort.factor code default and mapred-default.xml (rohithsharmaks: rev 683e0c71fe09600a24bdd7b707a613fe70ff1f6e) (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        committed to trunk/branch-2.. thanks Gera Shegalov and Prabhu for the patch.. thanks Akira Ajisaka for additional review.

        Show
        rohithsharma Rohith Sharma K S added a comment - committed to trunk/branch-2.. thanks Gera Shegalov and Prabhu for the patch.. thanks Akira Ajisaka for additional review.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        +1 LGTM, will commit later of the day.

        Show
        rohithsharma Rohith Sharma K S added a comment - +1 LGTM, will commit later of the day.
        Hide
        ajisakaa Akira Ajisaka added a comment -

        LGTM, +1.

        Show
        ajisakaa Akira Ajisaka added a comment - LGTM, +1.
        Hide
        jira.shegalov Gera Shegalov added a comment -

        Rohith Sharma K S, Sunil G, and others watching, a review of the latest patch MAPREDUCE-6793.003.patch is appreciated.

        Show
        jira.shegalov Gera Shegalov added a comment - Rohith Sharma K S , Sunil G , and others watching, a review of the latest patch MAPREDUCE-6793 .003.patch is appreciated.
        Hide
        jira.shegalov Gera Shegalov added a comment -

        the checkstyle warnings don't seem to be worth fixing.

        ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java:1824: private void mergeParts() throws IOException, InterruptedException, :5: Method length is 151 lines (max allowed is 150).

        Either this or the line width violation

        ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java:218: public static final int DEFAULT_IO_SORT_FACTOR = 10;:3: Redundant 'public' modifier.

        ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java:222: public static final int DEFAULT_IO_SORT_MB = 100;:3: Redundant 'public' modifier.

        The 2 above are for consistency with the rest of MRJobConfig. We can clean up 'public static final' for all fields in a separate JIRA

        Show
        jira.shegalov Gera Shegalov added a comment - the checkstyle warnings don't seem to be worth fixing. ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java:1824: private void mergeParts() throws IOException, InterruptedException, :5: Method length is 151 lines (max allowed is 150). Either this or the line width violation ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java:218: public static final int DEFAULT_IO_SORT_FACTOR = 10;:3: Redundant 'public' modifier. ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java:222: public static final int DEFAULT_IO_SORT_MB = 100;:3: Redundant 'public' modifier. The 2 above are for consistency with the rest of MRJobConfig. We can clean up 'public static final' for all fields in a separate JIRA
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 3s trunk passed
        +1 compile 0m 24s trunk passed
        +1 checkstyle 0m 27s trunk passed
        +1 mvnsite 0m 29s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 0m 48s trunk passed
        +1 javadoc 0m 21s trunk passed
        +1 mvninstall 0m 23s the patch passed
        +1 compile 0m 21s the patch passed
        +1 javac 0m 21s the patch passed
        -1 checkstyle 0m 24s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 3 new + 849 unchanged - 0 fixed = 852 total (was 849)
        +1 mvnsite 0m 26s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 51s the patch passed
        +1 javadoc 0m 18s the patch passed
        +1 unit 2m 44s hadoop-mapreduce-client-core in the patch passed.
        +1 asflicense 0m 16s The patch does not generate ASF License warnings.
        16m 33s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:e809691
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838322/MAPREDUCE-6793.003.patch
        JIRA Issue MAPREDUCE-6793
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 561167cbca9c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 86ac1ad
        Default Java 1.8.0_101
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6806/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
        Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6806/testReport/
        modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
        Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6806/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 3s trunk passed +1 compile 0m 24s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 48s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 23s the patch passed +1 compile 0m 21s the patch passed +1 javac 0m 21s the patch passed -1 checkstyle 0m 24s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 3 new + 849 unchanged - 0 fixed = 852 total (was 849) +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 51s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 2m 44s hadoop-mapreduce-client-core in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 16m 33s Subsystem Report/Notes Docker Image:yetus/hadoop:e809691 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838322/MAPREDUCE-6793.003.patch JIRA Issue MAPREDUCE-6793 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 561167cbca9c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 86ac1ad Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6806/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6806/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6806/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jira.shegalov Gera Shegalov added a comment -

        Hi Prabhu Joseph Let us include the test I provided so we have a green jenkins.
        For consistency, let us use MRJobConfig.IO_SORT_FACTOR everywhere instead of referring to it via subclass JobContext.IO_SORT_FACTOR

        nit: Some lines appear longer than 80 chars. Please fix it.

        Show
        jira.shegalov Gera Shegalov added a comment - Hi Prabhu Joseph Let us include the test I provided so we have a green jenkins. For consistency, let us use MRJobConfig.IO_SORT_FACTOR everywhere instead of referring to it via subclass JobContext.IO_SORT_FACTOR nit: Some lines appear longer than 80 chars. Please fix it.
        Hide
        sunilg Sunil G added a comment -

        +1. Patch looks fine for me.

        Show
        sunilg Sunil G added a comment - +1. Patch looks fine for me.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Latest patch looks fine to me.. Gera Shegalov do you have any comments?

        Show
        rohithsharma Rohith Sharma K S added a comment - Latest patch looks fine to me.. Gera Shegalov do you have any comments?
        Hide
        Prabhu Joseph Prabhu Joseph added a comment -

        Resubmitting a patch adapting Gera Shegalov comment.

        Show
        Prabhu Joseph Prabhu Joseph added a comment - Resubmitting a patch adapting Gera Shegalov comment.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        ahh.. thanks for pointing out.. lets change it to 10..

        Show
        rohithsharma Rohith Sharma K S added a comment - ahh.. thanks for pointing out.. lets change it to 10..
        Hide
        jira.shegalov Gera Shegalov added a comment -

        Rohith Sharma K S, this 100 in code is irrelevant because of the definition in mapred-default.xml that is inside mapreduce jar. We should include this kind of unit test in the patch, maybe in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java

          @Test
          public void testDefaultIoSortFactor() {
            final JobConf jobConf = new JobConf();
            assertEquals(10, jobConf.getInt(MRJobConfig.IO_SORT_FACTOR, 100));
          }
        

        to demonstrate it.

        Show
        jira.shegalov Gera Shegalov added a comment - Rohith Sharma K S , this 100 in code is irrelevant because of the definition in mapred-default.xml that is inside mapreduce jar. We should include this kind of unit test in the patch, maybe in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java @Test public void testDefaultIoSortFactor() { final JobConf jobConf = new JobConf(); assertEquals(10, jobConf.getInt(MRJobConfig.IO_SORT_FACTOR, 100)); } to demonstrate it.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        I think we should retain the default value as same as current one i.e 100. Changing in code default value MRJobConfig.DEFAULT_IO_SORT_FACTOR to 10 would reflect job performance which are using default values.

        Show
        rohithsharma Rohith Sharma K S added a comment - I think we should retain the default value as same as current one i.e 100. Changing in code default value MRJobConfig.DEFAULT_IO_SORT_FACTOR to 10 would reflect job performance which are using default values.
        Hide
        jira.shegalov Gera Shegalov added a comment -

        Hi Prabhu Joseph, thanks for working this JIRA. I suggest we don't change the effective default in this JIRA since it requires a separate discussion, and has implication on the number of random vs sequential iops + memory consumption. I would rather suggest just a cosmetic change to introduce a constant DEFAULT_IO_SORT_FACTOR = 10 in MRJobConfig and change code to

        .getInt(MRJobConfig.IO_SORT_FACTOR, MRJobConfig.DEFAULT_IO_SORT_FACTOR)

        in MapTask.java and MergeManagerImpl.java

        Rohith Sharma K S is it ok with you?

        Show
        jira.shegalov Gera Shegalov added a comment - Hi Prabhu Joseph , thanks for working this JIRA. I suggest we don't change the effective default in this JIRA since it requires a separate discussion, and has implication on the number of random vs sequential iops + memory consumption. I would rather suggest just a cosmetic change to introduce a constant DEFAULT_IO_SORT_FACTOR = 10 in MRJobConfig and change code to .getInt(MRJobConfig.IO_SORT_FACTOR, MRJobConfig.DEFAULT_IO_SORT_FACTOR) in MapTask.java and MergeManagerImpl.java Rohith Sharma K S is it ok with you?
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 6m 59s trunk passed
        +1 compile 0m 23s trunk passed
        +1 mvnsite 0m 27s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 javadoc 0m 22s trunk passed
        +1 mvninstall 0m 22s the patch passed
        +1 compile 0m 20s the patch passed
        +1 javac 0m 20s the patch passed
        +1 mvnsite 0m 25s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 javadoc 0m 18s the patch passed
        +1 unit 2m 45s hadoop-mapreduce-client-core in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        13m 48s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835477/0001-MAPREDUCE-6793.patch
        JIRA Issue MAPREDUCE-6793
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml
        uname Linux 78848f3714d7 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 9f32364
        Default Java 1.8.0_101
        Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6775/testReport/
        modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
        Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6775/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 59s trunk passed +1 compile 0m 23s trunk passed +1 mvnsite 0m 27s trunk passed +1 mvneclipse 0m 13s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 22s the patch passed +1 compile 0m 20s the patch passed +1 javac 0m 20s the patch passed +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 javadoc 0m 18s the patch passed +1 unit 2m 45s hadoop-mapreduce-client-core in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 13m 48s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835477/0001-MAPREDUCE-6793.patch JIRA Issue MAPREDUCE-6793 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml uname Linux 78848f3714d7 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9f32364 Default Java 1.8.0_101 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6775/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6775/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        +1 , pending jenkins

        Show
        rohithsharma Rohith Sharma K S added a comment - +1 , pending jenkins
        Hide
        Prabhu Joseph Prabhu Joseph added a comment -

        Hi Gera Shegalov, i would like to provide a patch for this. Assigned it to me.

        Show
        Prabhu Joseph Prabhu Joseph added a comment - Hi Gera Shegalov , i would like to provide a patch for this. Assigned it to me.

          People

          • Assignee:
            Prabhu Joseph Prabhu Joseph
            Reporter:
            jira.shegalov Gera Shegalov
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development