Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6357

MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: documentation
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled.

      2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150320@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2
             at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354)
             at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195)
             at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
             at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
             at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
             at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
             at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475)
             at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433)
             at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91)
             at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69)
             at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14)
             at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
             at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
             at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
             at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
             at java.security.AccessController.doPrivileged(Native Method)
             at javax.security.auth.Subject.doAs(Subject.java:415)
             at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
             at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      

      As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized.

      In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit.

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Hdfs-trunk #2221 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2221/)
        MAPREDUCE-6357. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)

        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2221 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2221/ ) MAPREDUCE-6357 . MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #283 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/283/)
        MAPREDUCE-6357. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)

        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        • hadoop-mapreduce-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #283 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/283/ ) MAPREDUCE-6357 . MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java hadoop-mapreduce-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2240 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2240/)
        MAPREDUCE-6357. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)

        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2240 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2240/ ) MAPREDUCE-6357 . MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #294 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/294/)
        MAPREDUCE-6357. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)

        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #294 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/294/ ) MAPREDUCE-6357 . MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #1024 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1024/)
        MAPREDUCE-6357. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)

        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        • hadoop-mapreduce-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1024 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1024/ ) MAPREDUCE-6357 . MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java hadoop-mapreduce-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #291 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/291/)
        MAPREDUCE-6357. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)

        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #291 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/291/ ) MAPREDUCE-6357 . MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #8333 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8333/)
        MAPREDUCE-6357. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)

        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
        • hadoop-mapreduce-project/CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #8333 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8333/ ) MAPREDUCE-6357 . MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java hadoop-mapreduce-project/CHANGES.txt
        Hide
        ajisakaa Akira Ajisaka added a comment -

        I've committed this to trunk and branch-2. Thanks Dustin Cote for the contribution.

        Show
        ajisakaa Akira Ajisaka added a comment - I've committed this to trunk and branch-2. Thanks Dustin Cote for the contribution.
        Hide
        ajisakaa Akira Ajisaka added a comment -

        +1, committing this.

        Show
        ajisakaa Akira Ajisaka added a comment - +1, committing this.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 16m 15s Findbugs (version ) appears to be broken on trunk.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 7m 44s There were no new javac warning messages.
        +1 javadoc 10m 16s There were no new javadoc warning messages.
        +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
        +1 checkstyle 0m 29s There were no new checkstyle issues.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 27s mvn install still works.
        +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
        +1 findbugs 1m 31s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 mapreduce tests 1m 52s Tests passed in hadoop-mapreduce-client-core.
            40m 38s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12749289/MAPREDUCE-6357-1.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / b6265d3
        hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
        Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/testReport/
        Java 1.7.0_55
        uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 16m 15s Findbugs (version ) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 44s There were no new javac warning messages. +1 javadoc 10m 16s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 29s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 27s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. +1 findbugs 1m 31s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 mapreduce tests 1m 52s Tests passed in hadoop-mapreduce-client-core.     40m 38s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12749289/MAPREDUCE-6357-1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / b6265d3 hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5932/console This message was automatically generated.
        Hide
        cotedm Dustin Cote added a comment -

        Fixing a typo and checkstyle warning. No tests since this is a doc change.

        Show
        cotedm Dustin Cote added a comment - Fixing a typo and checkstyle warning. No tests since this is a doc change.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 16m 36s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 7m 52s There were no new javac warning messages.
        +1 javadoc 9m 53s There were no new javadoc warning messages.
        +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
        -1 checkstyle 0m 46s The applied patch generated 3 new checkstyle issues (total was 29, now 32).
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 23s mvn install still works.
        +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
        +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 mapreduce tests 1m 47s Tests passed in hadoop-mapreduce-client-core.
            40m 43s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12749275/MAPREDUCE-6357-1.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / b6265d3
        checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt
        hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
        Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/testReport/
        Java 1.7.0_55
        uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 36s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 52s There were no new javac warning messages. +1 javadoc 9m 53s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 46s The applied patch generated 3 new checkstyle issues (total was 29, now 32). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 23s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 mapreduce tests 1m 47s Tests passed in hadoop-mapreduce-client-core.     40m 43s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12749275/MAPREDUCE-6357-1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / b6265d3 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5931/console This message was automatically generated.
        Hide
        cotedm Dustin Cote added a comment -

        Submitting the javadoc changes. Please let me know if anything look amiss. Thanks!

        Show
        cotedm Dustin Cote added a comment - Submitting the javadoc changes. Please let me know if anything look amiss. Thanks!
        Hide
        cotedm Dustin Cote added a comment -

        Thanks Ivan Mitic! I figured that I just ran into this same situation the other day, so it's fresh in my mind. I'll try to get a bite on the first draft of the change this week.

        Show
        cotedm Dustin Cote added a comment - Thanks Ivan Mitic ! I figured that I just ran into this same situation the other day, so it's fresh in my mind. I'll try to get a bite on the first draft of the change this week.
        Hide
        ivanmi Ivan Mitic added a comment -

        Thanks Dustin Cote, please feel free to take it up.

        Show
        ivanmi Ivan Mitic added a comment - Thanks Dustin Cote , please feel free to take it up.
        Hide
        cotedm Dustin Cote added a comment -

        Ivan Mitic, do you have plans to work on this one? If not, I can go ahead and make the change.

        Show
        cotedm Dustin Cote added a comment - Ivan Mitic , do you have plans to work on this one? If not, I can go ahead and make the change.

          People

          • Assignee:
            cotedm Dustin Cote
            Reporter:
            ivanmi Ivan Mitic
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development