Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5912

Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: client
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Tags:
      windows

      Description

      @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
           if (isMapTask() && conf.getNumReduceTasks() > 0) {
             try {
               Path mapOutput =  mapOutputFile.getOutputFile();
      -        FileSystem localFS = FileSystem.getLocal(conf);
      -        return localFS.getFileStatus(mapOutput).getLen();
      +        FileSystem fs = mapOutput.getFileSystem(conf);
      +        return fs.getFileStatus(mapOutput).getLen();
             } catch (IOException e) {
               LOG.warn ("Could not find output size " , e);
             }
      

      causes Windows local output files to be routed through HDFS:

      2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_000000_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_000000_0/file.out is not a valid DFS filename.
             at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
             at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
             at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
             at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
             at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
             at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
             at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
             at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
             at org.apache.hadoop.mapred.Task.done(Task.java:1048)
      

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1800 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1800/)
          MAPREDUCE-5912. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196. Contributed by Remus Rusanu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602282)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1800 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1800/ ) MAPREDUCE-5912 . Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 . Contributed by Remus Rusanu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602282 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1773 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1773/)
          MAPREDUCE-5912. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196. Contributed by Remus Rusanu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602282)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1773 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1773/ ) MAPREDUCE-5912 . Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 . Contributed by Remus Rusanu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602282 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #582 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/582/)
          MAPREDUCE-5912. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196. Contributed by Remus Rusanu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602282)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #582 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/582/ ) MAPREDUCE-5912 . Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 . Contributed by Remus Rusanu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602282 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5691 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5691/)
          MAPREDUCE-5912. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196. Contributed by Remus Rusanu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602282)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5691 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5691/ ) MAPREDUCE-5912 . Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 . Contributed by Remus Rusanu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1602282 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
          Hide
          cnauroth Chris Nauroth added a comment -

          I committed this patch to trunk. Remus, thank you for contributing the fix.

          Show
          cnauroth Chris Nauroth added a comment - I committed this patch to trunk. Remus, thank you for contributing the fix.
          Hide
          chris.douglas Chris Douglas added a comment -

          If in the future we want to revisit the idea of map outputs going somewhere different than the local file system, then I think we'd need a different patch. I think we'd want to make sure that the map output's Path instance contains an explicit scheme, so that the code here doesn't need to assume local vs. default vs. something else.

          Agreed. MAPREDUCE-5269 changed all Path instances returned from YARNOutputFiles to be fully qualified, but the two changes were separated.

          +1 for committing the workaround until HADOOP-10663 is ready.

          Show
          chris.douglas Chris Douglas added a comment - If in the future we want to revisit the idea of map outputs going somewhere different than the local file system, then I think we'd need a different patch. I think we'd want to make sure that the map output's Path instance contains an explicit scheme, so that the code here doesn't need to assume local vs. default vs. something else. Agreed. MAPREDUCE-5269 changed all Path instances returned from YARNOutputFiles to be fully qualified, but the two changes were separated. +1 for committing the workaround until HADOOP-10663 is ready.
          Hide
          cnauroth Chris Nauroth added a comment -

          +1 for this patch.

          Remus Rusanu, Carlo Curino and Chris Douglas, my understanding is that MAPREDUCE-5196 accidentally introduced this bug, but this part of the change is not strictly necessary for the goals of MAPREDUCE-5196. Based on that, I'm in favor of committing this patch to revert just the part of MAPREDUCE-5196 that caused the bug. The alternative patch on the Path class posted in HADOOP-10663 has some other potential side effects, so I prefer doing a localized fix here in MR. (I'll enter more details on HADOOP-10663.)

          If in the future we want to revisit the idea of map outputs going somewhere different than the local file system, then I think we'd need a different patch. I think we'd want to make sure that the map output's Path instance contains an explicit scheme, so that the code here doesn't need to assume local vs. default vs. something else.

          Can you let me know if you agree with committing this and not committing HADOOP-10663? I'll hold off on committing until I hear from one of you.

          Show
          cnauroth Chris Nauroth added a comment - +1 for this patch. Remus Rusanu , Carlo Curino and Chris Douglas , my understanding is that MAPREDUCE-5196 accidentally introduced this bug, but this part of the change is not strictly necessary for the goals of MAPREDUCE-5196 . Based on that, I'm in favor of committing this patch to revert just the part of MAPREDUCE-5196 that caused the bug. The alternative patch on the Path class posted in HADOOP-10663 has some other potential side effects, so I prefer doing a localized fix here in MR. (I'll enter more details on HADOOP-10663 .) If in the future we want to revisit the idea of map outputs going somewhere different than the local file system, then I think we'd need a different patch. I think we'd want to make sure that the map output's Path instance contains an explicit scheme, so that the code here doesn't need to assume local vs. default vs. something else. Can you let me know if you agree with committing this and not committing HADOOP-10663 ? I'll hold off on committing until I hear from one of you.
          Hide
          rusanu Remus Rusanu added a comment -

          I also posted a patch that solves HADOOP-10663. I guess if that is accepted, this is obsolete.

          Show
          rusanu Remus Rusanu added a comment - I also posted a patch that solves HADOOP-10663 . I guess if that is accepted, this is obsolete.
          Hide
          chris.douglas Chris Douglas added a comment -

          As you identified in HADOOP-10663, returning the default filesystem for local paths is not correct.

          Show
          chris.douglas Chris Douglas added a comment - As you identified in HADOOP-10663 , returning the default filesystem for local paths is not correct.
          Hide
          rusanu Remus Rusanu added a comment -

          No new tests included because this is a revert of an earlier breaking change. Manually validated the change on Windows.

          Show
          rusanu Remus Rusanu added a comment - No new tests included because this is a revert of an earlier breaking change. Manually validated the change on Windows.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648337/MAPREDUCE-5912.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4642//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4642//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648337/MAPREDUCE-5912.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4642//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4642//console This message is automatically generated.

            People

            • Assignee:
              rusanu Remus Rusanu
              Reporter:
              rusanu Remus Rusanu
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development