Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3193

FileInputFormat doesn't read files recursively in the input path dir

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.2, 2.0.0-alpha, 3.0.0
    • Fix Version/s: 3.0.0, 0.23.10, 2.1.1-beta
    • Component/s: mrv1, mrv2
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      java.io.FileNotFoundException is thrown,if input file is more than one folder level deep and the job is getting failed.
      Example:Input file is /r1/r2/input.txt

      1. MAPREDUCE-3193.patch
        6 kB
        Devaraj K
      2. MAPREDUCE-3193-1.patch
        8 kB
        Devaraj K
      3. MAPREDUCE-3193-2.patch
        13 kB
        Devaraj K
      4. MAPREDUCE-3193.security.patch
        15 kB
        Devaraj K
      5. MAPREDUCE-3193-2.patch
        13 kB
        Devaraj K
      6. MAPREDUCE-3193-3.patch
        12 kB
        Devaraj K
      7. MAPREDUCE-3193-4.patch
        12 kB
        Devaraj K
      8. MAPREDUCE-3193-5.patch
        12 kB
        Devaraj K

        Issue Links

          Activity

          Hide
          Mahadev konar added a comment -

          Ramgopal,
          What input format are you using?

          Show
          Mahadev konar added a comment - Ramgopal, What input format are you using?
          Hide
          Ramgopal N added a comment -

          I have executed Wordcount job from examples.jar.It uses FileInputFormat

          Show
          Ramgopal N added a comment - I have executed Wordcount job from examples.jar.It uses FileInputFormat
          Hide
          Devaraj K added a comment -

          Here is the problem. In FileInputFormat.listStatus, It considers the files/directories in one nested level and takes every thing as file. Finally it creates splits with directories and fails the task.

          FileInputFormat.java
           for (int i=0; i < dirs.length; ++i) {
                Path p = dirs[i];
                FileSystem fs = p.getFileSystem(job.getConfiguration()); 
                FileStatus[] matches = fs.globStatus(p, inputFilter);
                if (matches == null) {
                  errors.add(new IOException("Input path does not exist: " + p));
                } else if (matches.length == 0) {
                  errors.add(new IOException("Input Pattern " + p + " matches 0 files"));
                } else {
                  for (FileStatus globStat: matches) {
                    if (globStat.isDirectory()) {
                      for(FileStatus stat: fs.listStatus(globStat.getPath(),
                          inputFilter)) {
                        result.add(stat);
                      }          
                    } else {
                      result.add(globStat);
                    }
                  }
                }
              }
          
          Show
          Devaraj K added a comment - Here is the problem. In FileInputFormat.listStatus, It considers the files/directories in one nested level and takes every thing as file. Finally it creates splits with directories and fails the task. FileInputFormat.java for ( int i=0; i < dirs.length; ++i) { Path p = dirs[i]; FileSystem fs = p.getFileSystem(job.getConfiguration()); FileStatus[] matches = fs.globStatus(p, inputFilter); if (matches == null ) { errors.add( new IOException( "Input path does not exist: " + p)); } else if (matches.length == 0) { errors.add( new IOException( "Input Pattern " + p + " matches 0 files" )); } else { for (FileStatus globStat: matches) { if (globStat.isDirectory()) { for (FileStatus stat: fs.listStatus(globStat.getPath(), inputFilter)) { result.add(stat); } } else { result.add(globStat); } } } }
          Hide
          Arun C Murthy added a comment -

          This isn't something we should change lightly, it's probably going to break user apps.

          At least, we need a config to turn this on, and it should be off by default.

          Show
          Arun C Murthy added a comment - This isn't something we should change lightly, it's probably going to break user apps. At least, we need a config to turn this on, and it should be off by default.
          Hide
          Devaraj K added a comment -

          This isn't something we should change lightly, it's probably going to break user apps.

          If the input path contains one nested dir, it is considering as file and trying to execute the task and it fails with the below error. Failing the job itself when the inputpath contains nested dir might not be correct.

          Caused by: java.io.FileNotFoundException: File does not exist: /r1/r2
                  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:736)
                  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:699)
                  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:671)
                  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:315)
                  at org.apache.hadoop.hdfs.protocolR23Compatible.ClientNamenodeProtocolServerSideTranslatorR23.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorR23.java:130)
                  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:632)
                  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1517)
                  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1513)
                  at java.security.AccessController.doPrivileged(Native Method)
                  at javax.security.auth.Subject.doAs(Subject.java:396)
                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
                  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1511)
          
                  at org.apache.hadoop.ipc.Client.call(Client.java:1085)
                  at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
                  at $Proxy8.getBlockLocations(Unknown Source)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:130)
                  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:81)
                  at $Proxy8.getBlockLocations(Unknown Source)
                  at org.apache.hadoop.hdfs.protocolR23Compatible.ClientNamenodeProtocolTranslatorR23.getBlockLocations(ClientNamenodeProtocolTranslatorR23.java:150)
                  at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:566)
                  ... 14 more
          
          Show
          Devaraj K added a comment - This isn't something we should change lightly, it's probably going to break user apps. If the input path contains one nested dir, it is considering as file and trying to execute the task and it fails with the below error. Failing the job itself when the inputpath contains nested dir might not be correct. Caused by: java.io.FileNotFoundException: File does not exist: /r1/r2 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:736) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:699) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:671) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:315) at org.apache.hadoop.hdfs.protocolR23Compatible.ClientNamenodeProtocolServerSideTranslatorR23.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorR23.java:130) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:632) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1517) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1513) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1511) at org.apache.hadoop.ipc.Client.call(Client.java:1085) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244) at $Proxy8.getBlockLocations(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:130) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:81) at $Proxy8.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolR23Compatible.ClientNamenodeProtocolTranslatorR23.getBlockLocations(ClientNamenodeProtocolTranslatorR23.java:150) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:566) ... 14 more
          Hide
          Arun C Murthy added a comment -

          If the input path contains one nested dir, it is considering as file and trying to execute the task and it fails with the below error. Failing the job itself when the inputpath contains nested dir might not be correct.

          Yes, I realize that.

          FileInputFormat has this behaviour for a long while and changing it now (we definitely shouldn't do this for hadoop-0.23.0) will probably affect a lot of apps. Hence, at the very least, we need to have this off by default.

          Show
          Arun C Murthy added a comment - If the input path contains one nested dir, it is considering as file and trying to execute the task and it fails with the below error. Failing the job itself when the inputpath contains nested dir might not be correct. Yes, I realize that. FileInputFormat has this behaviour for a long while and changing it now (we definitely shouldn't do this for hadoop-0.23.0) will probably affect a lot of apps. Hence, at the very least, we need to have this off by default.
          Hide
          Devaraj K added a comment -

          I updated the patch by making this change as configurable, by default uses the old behaviour.

          Show
          Devaraj K added a comment - I updated the patch by making this change as configurable, by default uses the old behaviour.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12499866/MAPREDUCE-3193-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12499866/MAPREDUCE-3193-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1082//console This message is automatically generated.
          Hide
          Devaraj K added a comment -

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          These findbugs are not related to the patch.

          Show
          Devaraj K added a comment - -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. These findbugs are not related to the patch.
          Hide
          Mahadev konar added a comment -

          Devaraj,
          Can you please run ant test with the patch? There are quite a few tests for FileInputForamt in ant which are not run via the hudson bot.

          Show
          Mahadev konar added a comment - Devaraj, Can you please run ant test with the patch? There are quite a few tests for FileInputForamt in ant which are not run via the hudson bot.
          Hide
          Devaraj K added a comment -

          Hi Mahdev,

          I ran ant test, there are no failures because of this patch. Some tests fail in my env always which are not related to this.

          Anyway it uses the old behavior by default, if we want to read recursively need to set that property value as true.

          Show
          Devaraj K added a comment - Hi Mahdev, I ran ant test, there are no failures because of this patch. Some tests fail in my env always which are not related to this. Anyway it uses the old behavior by default, if we want to read recursively need to set that property value as true.
          Hide
          Tom White added a comment -

          MAPREDUCE-1501 added this behaviour to the old API. Can you change your patch to share code and tests so that both the old and new API behave in the same way? Also, the old configuration parameter should be deprecated, but still supported in the new API.

          Show
          Tom White added a comment - MAPREDUCE-1501 added this behaviour to the old API. Can you change your patch to share code and tests so that both the old and new API behave in the same way? Also, the old configuration parameter should be deprecated, but still supported in the new API.
          Hide
          Mahadev konar added a comment -

          cancelling patch to address Tom's comments.

          Show
          Mahadev konar added a comment - cancelling patch to address Tom's comments.
          Hide
          Prashant Sharma added a comment -

          Folks is it WIP? can you also provide fix for 20.205.

          Show
          Prashant Sharma added a comment - Folks is it WIP? can you also provide fix for 20.205.
          Hide
          Devaraj K added a comment -

          Yes Prashant. I will provide a patch for 0.20.205 also. Thanks.

          Show
          Devaraj K added a comment - Yes Prashant. I will provide a patch for 0.20.205 also. Thanks.
          Hide
          Devaraj K added a comment -

          Attached the patch for trunk and security branch ,addressing the review comments.

          Show
          Devaraj K added a comment - Attached the patch for trunk and security branch ,addressing the review comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12505105/MAPREDUCE-3193.security.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1343//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505105/MAPREDUCE-3193.security.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1343//console This message is automatically generated.
          Hide
          Devaraj K added a comment -

          Resubmitting the patch to trigger hudson with patch on trunk branch.

          Show
          Devaraj K added a comment - Resubmitting the patch to trigger hudson with patch on trunk branch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12505206/MAPREDUCE-3193-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 1818 javac compiler warnings (more than the trunk's current 1817 warnings).

          -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1344//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1344//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-examples.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1344//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505206/MAPREDUCE-3193-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 1818 javac compiler warnings (more than the trunk's current 1817 warnings). -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1344//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1344//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-examples.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1344//console This message is automatically generated.
          Hide
          Devaraj K added a comment -

          -1 javac. The applied patch generated 1818 javac compiler warnings (more than the trunk's current 1817 warnings).

          It is due to the deprecated MiniDFSCluster used in new test added for TestFileInputFormat.

          -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings.

          These warnings are not introduced by this patch.

          Show
          Devaraj K added a comment - -1 javac. The applied patch generated 1818 javac compiler warnings (more than the trunk's current 1817 warnings). It is due to the deprecated MiniDFSCluster used in new test added for TestFileInputFormat. -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings. These warnings are not introduced by this patch.
          Hide
          Shilo Ayalon added a comment -

          I'm having the same problem with hadoop-1.0.2. Given the following directory structure (in hdfs):

          one/
          ├── three/
          │   └── four/
          │       ├── baz.txt
          │       ├── bleh.txt
          │       └── foo.txt
          └── two/
              ├── bar.txt
              └── gaa.txt
          

          As no recursive path support is available, I'm walking the root folder and adding all subdirs to the job. However, adding file-less folders like one and one/three as input paths to the job raises this exception:

          java.io.FileNotFoundException: /user/hduser/data/one (Is a directory)
          

          The actual number of files present is massive (20k/30k+), so passing all on the command line seems redundant. Will this patch be added to the source at some point?

          Show
          Shilo Ayalon added a comment - I'm having the same problem with hadoop-1.0.2. Given the following directory structure (in hdfs): one/ ├── three/ │ └── four/ │ ├── baz.txt │ ├── bleh.txt │ └── foo.txt └── two/ ├── bar.txt └── gaa.txt As no recursive path support is available, I'm walking the root folder and adding all subdirs to the job. However, adding file-less folders like one and one/three as input paths to the job raises this exception: java.io.FileNotFoundException: /user/hduser/data/one (Is a directory) The actual number of files present is massive (20k/30k+), so passing all on the command line seems redundant. Will this patch be added to the source at some point?
          Hide
          Devaraj K added a comment -

          It has been in 'Patch Available' for some time and good to have it in.

          Can any one have a look into the patch and review it? Thnx.

          Show
          Devaraj K added a comment - It has been in 'Patch Available' for some time and good to have it in. Can any one have a look into the patch and review it? Thnx.
          Hide
          Harsh J added a comment -

          Thanks for the patches. This should get in soon cause its a wide divide between old and new API feature sets.

          I have a few questions though:

          MAPREDUCE-1501 added this behaviour to the old API. Can you change your patch to share code and tests so that both the old and new API behave in the same way? Also, the old configuration parameter should be deprecated, but still supported in the new API.

          Given that both APIs are now supported, do we really need the deprecation? Will the new name apply to both? Are other properties handled in the same way today?

          For example I see in old API the following reuse:

          public static final String NUM_INPUT_FILES =
              org.apache.hadoop.mapreduce.lib.input.FileInputFormat.NUM_INPUT_FILES;
          

          While this patch does not change similar things in mapred.lib even after deprecation marker. Can this be done here too?

          + <groupId>org.apache.hadoop</groupId>
          + <artifactId>hadoop-hdfs</artifactId>

          Can the test not be done with just LFS? We can avoid a dependency if it can be done. Similarly a LJRunner test would be great too, if alright - instead of an MR cluster.

          mapreduce.input.fileinputformat.readinputfilesrecursively

          The last part can still be bettered I think. (Nit: Its not reading recursively, just listing that way.) Perhaps "mapreduce.input.fileinputformat.input.dir.recursive" is simpler to have?

          Show
          Harsh J added a comment - Thanks for the patches. This should get in soon cause its a wide divide between old and new API feature sets. I have a few questions though: MAPREDUCE-1501 added this behaviour to the old API. Can you change your patch to share code and tests so that both the old and new API behave in the same way? Also, the old configuration parameter should be deprecated, but still supported in the new API. Given that both APIs are now supported, do we really need the deprecation? Will the new name apply to both? Are other properties handled in the same way today? For example I see in old API the following reuse: public static final String NUM_INPUT_FILES = org.apache.hadoop.mapreduce.lib.input.FileInputFormat.NUM_INPUT_FILES; While this patch does not change similar things in mapred.lib even after deprecation marker. Can this be done here too? + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-hdfs</artifactId> Can the test not be done with just LFS? We can avoid a dependency if it can be done. Similarly a LJRunner test would be great too, if alright - instead of an MR cluster. mapreduce.input.fileinputformat.readinputfilesrecursively The last part can still be bettered I think. (Nit: Its not reading recursively, just listing that way.) Perhaps "mapreduce.input.fileinputformat.input.dir.recursive" is simpler to have?
          Hide
          Devaraj K added a comment -

          Thanks Harsh for looking into the patch. Thanks a lot for your time.

          Given that both APIs are now supported, do we really need the deprecation? Will the new name apply to both? Are other properties handled in the same way today?

          For example I see in old API the following reuse:

          public static final String NUM_INPUT_FILES =
              org.apache.hadoop.mapreduce.lib.input.FileInputFormat.NUM_INPUT_FILES;
          

          While this patch does not change similar things in mapred.lib even after deprecation marker. Can this be done here too?

          The deprecation is still needed. I should use the new name reuse in the old api as it is done for other properties. I will update in the next patch.

          Can the test not be done with just LFS? We can avoid a dependency if it can be done. Similarly a LJRunner test would be great too, if alright - instead of an MR cluster.

          I don't see any problem of having the test cases this way other than adding the dependency. I feel would be good to have this test case with MiniDFSCluster.

          The last part can still be bettered I think. (Nit: Its not reading recursively, just listing that way.) Perhaps "mapreduce.input.fileinputformat.input.dir.recursive" is simpler to have?

          I thought from the external perspective, this we don't use other than for reading purpose. Anyway we can change this name to "mapreduce.input.fileinputformat.input.dir.recursive". I will update in the next patch.

          Please let me if any other nits/suggestions, I will update the patch.

          Show
          Devaraj K added a comment - Thanks Harsh for looking into the patch. Thanks a lot for your time. Given that both APIs are now supported, do we really need the deprecation? Will the new name apply to both? Are other properties handled in the same way today? For example I see in old API the following reuse: public static final String NUM_INPUT_FILES = org.apache.hadoop.mapreduce.lib.input.FileInputFormat.NUM_INPUT_FILES; While this patch does not change similar things in mapred.lib even after deprecation marker. Can this be done here too? The deprecation is still needed. I should use the new name reuse in the old api as it is done for other properties. I will update in the next patch. Can the test not be done with just LFS? We can avoid a dependency if it can be done. Similarly a LJRunner test would be great too, if alright - instead of an MR cluster. I don't see any problem of having the test cases this way other than adding the dependency. I feel would be good to have this test case with MiniDFSCluster. The last part can still be bettered I think. (Nit: Its not reading recursively, just listing that way.) Perhaps "mapreduce.input.fileinputformat.input.dir.recursive" is simpler to have? I thought from the external perspective, this we don't use other than for reading purpose. Anyway we can change this name to "mapreduce.input.fileinputformat.input.dir.recursive". I will update in the next patch. Please let me if any other nits/suggestions, I will update the patch.
          Hide
          Harsh J added a comment -

          Thanks Harsh for looking into the patch. Thanks a lot for your time.

          I've been busy lately, but I plan to go over your other patches soon. Sorry for the delays on your requests!

          I don't see any problem of having the test cases this way other than adding the dependency. I feel would be good to have this test case with MiniDFSCluster.

          The problem is test-run time. With LFS/LJR its pretty fast, and if test isn't HDFS specific nor has anything to do with MR framework in particular, we can avoid using mini clusters.

          I thought from the external perspective, this we don't use other than for reading purpose. Anyway we can change this name to "mapreduce.input.fileinputformat.input.dir.recursive". I will update in the next patch.

          Config elements are also utilized by streaming users since there's no other way for them other than to use -D to specify this. Readability helps in general, wherever it is

          Show
          Harsh J added a comment - Thanks Harsh for looking into the patch. Thanks a lot for your time. I've been busy lately, but I plan to go over your other patches soon. Sorry for the delays on your requests! I don't see any problem of having the test cases this way other than adding the dependency. I feel would be good to have this test case with MiniDFSCluster. The problem is test-run time. With LFS/LJR its pretty fast, and if test isn't HDFS specific nor has anything to do with MR framework in particular, we can avoid using mini clusters. I thought from the external perspective, this we don't use other than for reading purpose. Anyway we can change this name to "mapreduce.input.fileinputformat.input.dir.recursive". I will update in the next patch. Config elements are also utilized by streaming users since there's no other way for them other than to use -D to specify this. Readability helps in general, wherever it is
          Hide
          Devaraj K added a comment -

          Updated the patch for trunk.

          Show
          Devaraj K added a comment - Updated the patch for trunk.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530373/MAPREDUCE-3193-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2426//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2426//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530373/MAPREDUCE-3193-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2426//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2426//console This message is automatically generated.
          Hide
          Devaraj K added a comment -

          Hi Harsh, can you have a look into the updated patch when you find some time?

          Show
          Devaraj K added a comment - Hi Harsh, can you have a look into the updated patch when you find some time?
          Hide
          Amar Kamat added a comment -

          Devaraj,
          I quickly reviewed the latest patch and here are some comments
          1. I see lot of similarity between the 2 implementations of FileInputFormats. Any possibility of reusing. At least the split creation (and all the methods involved with it) part can be factored out.
          2. Test case to make sure that the two configs (mapred.input.dir.recursive & mapreduce.input.fileinputformat.input.dir.recursive) are essentially linked

          Harsh J Is it okay if I take this to completion?

          Show
          Amar Kamat added a comment - Devaraj, I quickly reviewed the latest patch and here are some comments 1. I see lot of similarity between the 2 implementations of FileInputFormats. Any possibility of reusing. At least the split creation (and all the methods involved with it) part can be factored out. 2. Test case to make sure that the two configs (mapred.input.dir.recursive & mapreduce.input.fileinputformat.input.dir.recursive) are essentially linked Harsh J Is it okay if I take this to completion?
          Hide
          Devaraj K added a comment -

          Cancelling patch to address Amar comments.

          Show
          Devaraj K added a comment - Cancelling patch to address Amar comments.
          Hide
          Devaraj K added a comment -

          Thanks a lot Amar for looking into the patch. Sorry for the delay in the patch updation.

          1. I see lot of similarity between the 2 implementations of FileInputFormats. Any possibility of reusing. At least the split creation (and all the methods involved with it) part can be factored out.

          I don't think, it is good to refactor the code as part of this Jira. We can do the same in separate Jira.

          2. Test case to make sure that the two configs (mapred.input.dir.recursive & mapreduce.input.fileinputformat.input.dir.recursive) are essentially linked

          I have updated the test case to check with these two configurations.

          Please review the updated patch.

          Show
          Devaraj K added a comment - Thanks a lot Amar for looking into the patch. Sorry for the delay in the patch updation. 1. I see lot of similarity between the 2 implementations of FileInputFormats. Any possibility of reusing. At least the split creation (and all the methods involved with it) part can be factored out. I don't think, it is good to refactor the code as part of this Jira. We can do the same in separate Jira. 2. Test case to make sure that the two configs (mapred.input.dir.recursive & mapreduce.input.fileinputformat.input.dir.recursive) are essentially linked I have updated the test case to check with these two configurations. Please review the updated patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12587785/MAPREDUCE-3193-4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3770//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3770//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587785/MAPREDUCE-3193-4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3770//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3770//console This message is automatically generated.
          Hide
          Amar Kamat added a comment -

          +1

          Show
          Amar Kamat added a comment - +1
          Hide
          Jason Lowe added a comment -

          Thanks, Devaraj, looks pretty good to me as well. I agree with Harsh J that mapreduce.input.fileinputformat.input.dir.recursive is more descriptive/accurate, can we also change the getter/setter functions to match? They still refer to reading recursively – something like get/setInputDirRecursive would be better.

          Also one minor nit: there's a double semicolon after the call to getReadFilesRecursively.

          Show
          Jason Lowe added a comment - Thanks, Devaraj, looks pretty good to me as well. I agree with Harsh J that mapreduce.input.fileinputformat.input.dir.recursive is more descriptive/accurate, can we also change the getter/setter functions to match? They still refer to reading recursively – something like get/setInputDirRecursive would be better. Also one minor nit: there's a double semicolon after the call to getReadFilesRecursively .
          Hide
          Devaraj K added a comment -

          Thanks Amar and Jason.

          I have updated the patch with the suggested change, Please review this.

          Show
          Devaraj K added a comment - Thanks Amar and Jason. I have updated the patch with the suggested change, Please review this.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590385/MAPREDUCE-3193-5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3821//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3821//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590385/MAPREDUCE-3193-5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3821//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3821//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          +1, will commit later today

          Show
          Jason Lowe added a comment - +1, will commit later today
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #4033 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4033/)
          MAPREDUCE-3193. FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499125)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499125
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #4033 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4033/ ) MAPREDUCE-3193 . FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499125) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499125 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Hide
          Jason Lowe added a comment -

          Thanks Devaraj, and to all others who contributed to the review! I committed this to trunk, branch-2, and branch-0.23.

          Show
          Jason Lowe added a comment - Thanks Devaraj, and to all others who contributed to the review! I committed this to trunk, branch-2, and branch-0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #259 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/259/)
          MAPREDUCE-3193. FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499125)

          Result = FAILURE
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499125
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #259 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/259/ ) MAPREDUCE-3193 . FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499125) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499125 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #657 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/657/)
          svn merge -c 1499125 FIXES: MAPREDUCE-3193. FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499131)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499131
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #657 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/657/ ) svn merge -c 1499125 FIXES: MAPREDUCE-3193 . FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499131) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499131 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1449 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1449/)
          MAPREDUCE-3193. FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499125)

          Result = FAILURE
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499125
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1449 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1449/ ) MAPREDUCE-3193 . FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499125) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499125 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1476 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1476/)
          MAPREDUCE-3193. FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499125)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499125
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1476 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1476/ ) MAPREDUCE-3193 . FileInputFormat doesn't read files recursively in the input path dir. Contributed by Devaraj K (Revision 1499125) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499125 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFileInputFormat.java
          Hide
          rulinma added a comment -

          I don't think it is a bug. You can use FileInputFormat.addInputPaths(job, inputPath) to repleace it.

          Show
          rulinma added a comment - I don't think it is a bug. You can use FileInputFormat.addInputPaths(job, inputPath) to repleace it.
          Hide
          Jason Lowe added a comment -

          I don't think it is a bug. You can use FileInputFormat.addInputPaths(job, inputPath) to repleace it.

          The heart of the issue is consistency between org.apache.hadoop.mapred.FileInputFormat and org.apache.hadoop.mapreduce.lib.input.FileInputFormat. The former supports recursive processing of input paths when mapred.input.dir.recursive is true while the latter did not. This changes org.apache.hadoop.mapreduce.lib.input.FileInputFormat to match that behavior so users of the mapreduce API can easily process input paths recursively.

          Show
          Jason Lowe added a comment - I don't think it is a bug. You can use FileInputFormat.addInputPaths(job, inputPath) to repleace it. The heart of the issue is consistency between org.apache.hadoop.mapred.FileInputFormat and org.apache.hadoop.mapreduce.lib.input.FileInputFormat. The former supports recursive processing of input paths when mapred.input.dir.recursive is true while the latter did not. This changes org.apache.hadoop.mapreduce.lib.input.FileInputFormat to match that behavior so users of the mapreduce API can easily process input paths recursively.
          Hide
          Jason Lowe added a comment -

          I pulled this into branch-2.1-beta as well.

          Show
          Jason Lowe added a comment - I pulled this into branch-2.1-beta as well.

            People

            • Assignee:
              Devaraj K
              Reporter:
              Ramgopal N
            • Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development