Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1522

FileInputFormat may change the file system of an input path

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(Job job, Path path) uses the default FileSystem but not the FileSystem specified in the path.

      //org.apache.hadoop.mapreduce.lib.input.FileInputFormat
        public static void addInputPath(Job job, 
                                        Path path) throws IOException {
          Configuration conf = job.getConfiguration();
          FileSystem fs = FileSystem.get(conf);
          path = path.makeQualified(fs); // the original FileSystem is lost.
          ...
        }
      

      There is a similar problem in FileInputFormat.setInputPaths(..).

      1. m1552_20100223_0.20.patch
        2 kB
        Tsz Wo Nicholas Sze
      2. M1522-1v21.patch
        4 kB
        Chris Douglas
      3. M1522-1v20.patch
        4 kB
        Chris Douglas
      4. M1522-1.patch
        4 kB
        Chris Douglas
      5. m1522_20100301.patch
        4 kB
        Tsz Wo Nicholas Sze
      6. m1522_20100301_0.21.patch
        5 kB
        Tsz Wo Nicholas Sze
      7. m1522_20100301_0.20.patch
        4 kB
        Tsz Wo Nicholas Sze
      8. m1522_20100224.patch
        4 kB
        Tsz Wo Nicholas Sze
      9. m1522_20100224_0.20.patch
        4 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          A simple fix is to use Path.getFileSystem(conf).

          diff --git src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
           src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
          index a2b6c49..04adeb0 100644
          --- src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
          +++ src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
          @@ -354,7 +355,7 @@ public abstract class FileInputFormat<K, V> extends InputFormat<K, V> {
             public static void addInputPath(Job job, 
                                             Path path) throws IOException {
               Configuration conf = job.getConfiguration();
          -    FileSystem fs = FileSystem.get(conf);
          +    final FileSystem fs = path.getFileSystem(conf);
               path = path.makeQualified(fs);
               String dirStr = StringUtils.escapeString(path.toString());
               String dirs = conf.get("mapred.input.dir");
          

          We should also check the old org.apache.hadoop.mapred.FileInputFormat for this problem.

          Show
          Tsz Wo Nicholas Sze added a comment - A simple fix is to use Path.getFileSystem(conf). diff --git src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java index a2b6c49..04adeb0 100644 --- src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java +++ src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java @@ -354,7 +355,7 @@ public abstract class FileInputFormat<K, V> extends InputFormat<K, V> { public static void addInputPath(Job job, Path path) throws IOException { Configuration conf = job.getConfiguration(); - FileSystem fs = FileSystem.get(conf); + final FileSystem fs = path.getFileSystem(conf); path = path.makeQualified(fs); String dirStr = StringUtils.escapeString(path.toString()); String dirs = conf.get( "mapred.input.dir" ); We should also check the old org.apache.hadoop.mapred.FileInputFormat for this problem.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          m1552_20100223_0.20.patch: uses the fs specified in the input paths. Will add some tests.

          Show
          Tsz Wo Nicholas Sze added a comment - m1552_20100223_0.20.patch: uses the fs specified in the input paths. Will add some tests.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          m1522_20100224_0.20.patch: added a test

          Show
          Tsz Wo Nicholas Sze added a comment - m1522_20100224_0.20.patch: added a test
          Hide
          Tsz Wo Nicholas Sze added a comment -

          m1522_20100224.patch: for trunk.

          Show
          Tsz Wo Nicholas Sze added a comment - m1522_20100224.patch: for trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12436958/m1522_20100224.patch
          against trunk revision 915223.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/488/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/488/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/488/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/488/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436958/m1522_20100224.patch against trunk revision 915223. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/488/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/488/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/488/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/488/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The failed TestMiniMRLocalFS and TestJobTrackerStart are not related to this. Both of them have some history of failure.

          Show
          Tsz Wo Nicholas Sze added a comment - The failed TestMiniMRLocalFS and TestJobTrackerStart are not related to this. Both of them have some history of failure.
          Hide
          Amareshwari Sriramadasu added a comment -

          Looked at the patch for trunk.
          Some comments:
          1. The method Path.makeQualified(FileSystem) is deprecated. Can you change the call to something like inputPath.getFileSystem(conf).makeQualified(inputPath)?
          2. In the testcase, can you change the calls conf.get(mapred.input.dir) to FileInputFormat.getInputPaths(job).

          Show
          Amareshwari Sriramadasu added a comment - Looked at the patch for trunk. Some comments: 1. The method Path.makeQualified(FileSystem) is deprecated. Can you change the call to something like inputPath.getFileSystem(conf).makeQualified(inputPath)? 2. In the testcase, can you change the calls conf.get(mapred.input.dir) to FileInputFormat.getInputPaths(job).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Amareshwari, thanks for the review.

          m1522_20100224.patch: incorporated both review comments.

          Without the changes in FileInputFormat, it will fail on the new test as following.

          defaultfs.getUri() = s3://abc:xyz@hostname
          original = file:/foo
          results = [file://abc:xyz@hostname/foo]
          ------------- ---------------- ---------------
          
          Testcase: testAddInputPath took 0.704 sec
          	FAILED
          expected:<file:/foo> but was:<file://abc:xyz@hostname/foo>
          junit.framework.AssertionFailedError: expected:<file:/foo> but was:<file://abc:xyz@hostname/foo>
          	at org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat.testAddInputPath(TestFileInputFormat.java:54)
          
          Show
          Tsz Wo Nicholas Sze added a comment - Amareshwari, thanks for the review. m1522_20100224.patch: incorporated both review comments. Without the changes in FileInputFormat, it will fail on the new test as following. defaultfs.getUri() = s3://abc:xyz@hostname original = file:/foo results = [file://abc:xyz@hostname/foo] ------------- ---------------- --------------- Testcase: testAddInputPath took 0.704 sec FAILED expected:<file:/foo> but was:<file://abc:xyz@hostname/foo> junit.framework.AssertionFailedError: expected:<file:/foo> but was:<file://abc:xyz@hostname/foo> at org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat.testAddInputPath(TestFileInputFormat.java:54)
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > m1522_20100224.patch: incorporated both review comments.

          Oops, the file should be m1522_20100301.patch.

          Show
          Tsz Wo Nicholas Sze added a comment - > m1522_20100224.patch: incorporated both review comments. Oops, the file should be m1522_20100301.patch.
          Hide
          Amareshwari Sriramadasu added a comment -

          +1 Changes look fine.

          Show
          Amareshwari Sriramadasu added a comment - +1 Changes look fine.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12437503/m1522_20100301.patch
          against trunk revision 918206.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/14/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/14/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12437503/m1522_20100301.patch against trunk revision 918206. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/14/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/14/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/14/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/14/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > -1 core tests. The patch failed core unit tests.
          TestMiniMRLocalFS.testWithLocal failed with age=7. It is not related to this.

          Show
          Tsz Wo Nicholas Sze added a comment - > -1 core tests. The patch failed core unit tests. TestMiniMRLocalFS.testWithLocal failed with age=7. It is not related to this.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          m1522_20100301_0.20.patch: for 0.20

          Use m1522_20100301.patch for 0.21.

          Show
          Tsz Wo Nicholas Sze added a comment - m1522_20100301_0.20.patch: for 0.20 Use m1522_20100301.patch for 0.21.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Use m1522_20100301.patch for 0.21.
          The patch currently is not complied on 0.21 since mockito is missing. I will fix it.

          Show
          Tsz Wo Nicholas Sze added a comment - > Use m1522_20100301.patch for 0.21. The patch currently is not complied on 0.21 since mockito is missing. I will fix it.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          m1522_20100301_0.21.patch: for 0.21, added mockito to ivy.

          Show
          Tsz Wo Nicholas Sze added a comment - m1522_20100301_0.21.patch: for 0.21, added mockito to ivy.
          Hide
          Chris Douglas added a comment -
          • Setting fs.default.name prior to creating the Job should make the mockito part of the test unnecessary, avoiding pushing the ivy changes through 0.21 and 0.20.
          • Consider JUnit4 annotations instead of extending TestCase
          Show
          Chris Douglas added a comment - Setting fs.default.name prior to creating the Job should make the mockito part of the test unnecessary, avoiding pushing the ivy changes through 0.21 and 0.20. Consider JUnit4 annotations instead of extending TestCase
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12437996/M1522-1.patch
          against trunk revision 919335.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/343/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/343/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/343/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/343/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12437996/M1522-1.patch against trunk revision 919335. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/343/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/343/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/343/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/343/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 The changes look good.
          Thanks, Chris.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 The changes look good. Thanks, Chris.
          Hide
          Chris Douglas added a comment -

          All test failures are due to Hudson's recent abject uselessness:

          java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
          	at java.lang.Class.forName0(Native Method)
          	at java.lang.Class.forName(Class.java:169)
          Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
          	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
          	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
          	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          	at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
          	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
          

          The former test results hold, as:

          1. The original run failed only TestMRLocalFS due to MAPREDUCE-1520
          2. The only changes are in the unit test, which passes
          Show
          Chris Douglas added a comment - All test failures are due to Hudson's recent abject uselessness: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) The former test results hold, as: The original run failed only TestMRLocalFS due to MAPREDUCE-1520 The only changes are in the unit test, which passes
          Hide
          Chris Douglas added a comment -

          +1

          I committed this. Thanks, Nicholas!

          Show
          Chris Douglas added a comment - +1 I committed this. Thanks, Nicholas!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #265 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/265/)
          . FileInputFormat may use the default FileSystem for the
          input path. Contributed by Tsz Wo (Nicholas), SZE

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #265 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/265/ ) . FileInputFormat may use the default FileSystem for the input path. Contributed by Tsz Wo (Nicholas), SZE
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #250 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/250/)
          . FileInputFormat may use the default FileSystem for the
          input path. Contributed by Tsz Wo (Nicholas), SZE

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #250 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/250/ ) . FileInputFormat may use the default FileSystem for the input path. Contributed by Tsz Wo (Nicholas), SZE

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development