Hadoop Common
  1. Hadoop Common
  2. HADOOP-6890

Improve listFiles API introduced by HADOOP-6870

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This jira is mainly for addressing Suresh's review comments for HADOOP-6870:

      1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS.
      2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it?
      3. FileSystem.java

      • listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false.
      • listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown.
        4. TestListFiles.java
      • testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test.
      1. listFilesInFS.patch
        8 kB
        Hairong Kuang
      2. improveListFiles.patch
        11 kB
        Hairong Kuang
      3. improveListFiles1.patch
        12 kB
        Hairong Kuang
      4. improveListFiles2.patch
        12 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Hairong Kuang added a comment -

          > I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS.
          We could put this new API in FileContext.utils. FileSystem will be deprecated soon so no need to worry about it.

          > Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result.
          > Should we ignore if a directory was in stack and was not found later when iterating through it?
          GetFiles is used by MapReduce job client. In the scenario that you described, it is good to throw an exception so fail the job earlier. Otherwise, the job won't fail until map tasks are launched.

          I will incorporate the rest of the comments.

          Show
          Hairong Kuang added a comment - > I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. We could put this new API in FileContext.utils. FileSystem will be deprecated soon so no need to worry about it. > Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. > Should we ignore if a directory was in stack and was not found later when iterating through it? GetFiles is used by MapReduce job client. In the scenario that you described, it is good to throw an exception so fail the job earlier. Otherwise, the job won't fail until map tasks are launched. I will incorporate the rest of the comments.
          Hide
          Hairong Kuang added a comment -

          This patch
          1. Move listFiles to be a method of FileContext#Util;
          2. Modify FileContext#Util#listFiles so that the subtree is traversed in the depth-first order.
          3. Improve the comments.

          Show
          Hairong Kuang added a comment - This patch 1. Move listFiles to be a method of FileContext#Util; 2. Modify FileContext#Util#listFiles so that the subtree is traversed in the depth-first order. 3. Improve the comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12450922/improveListFiles.patch
          against trunk revision 980648.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450922/improveListFiles.patch against trunk revision 980648. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch.

          Show
          Suresh Srinivas added a comment - +1 for the patch.
          Hide
          Hairong Kuang added a comment -

          Thanks Suresh for your quick review!

          The 6 javadoc warnings are all security related, not introduced by this patch.

          Show
          Hairong Kuang added a comment - Thanks Suresh for your quick review! The 6 javadoc warnings are all security related, not introduced by this patch.
          Hide
          Hairong Kuang added a comment -

          I've committed this!

          Show
          Hairong Kuang added a comment - I've committed this!
          Hide
          Hairong Kuang added a comment -

          This patch makes the implementation of listFiles in FileSystem is the same as the one in FileContext except that it does not handle symbolic links. This will make the code easier to maintain later one.

          Show
          Hairong Kuang added a comment - This patch makes the implementation of listFiles in FileSystem is the same as the one in FileContext except that it does not handle symbolic links. This will make the code easier to maintain later one.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12451056/improveListFiles.patch
          against trunk revision 980953.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/655/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/655/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/655/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/655/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12451056/improveListFiles.patch against trunk revision 980953. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/655/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/655/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/655/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/655/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          Fixed the failed test.

          Show
          Hairong Kuang added a comment - Fixed the failed test.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12451058/improveListFiles1.patch
          against trunk revision 980953.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/656/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/656/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/656/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/656/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12451058/improveListFiles1.patch against trunk revision 980953. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/656/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/656/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/656/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/656/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          I can not believe that such a trivial patch made hudson failed twice. Sorry for not running all unit tests locally before submitting the patch previously.

          Show
          Hairong Kuang added a comment - I can not believe that such a trivial patch made hudson failed twice. Sorry for not running all unit tests locally before submitting the patch previously.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12451066/improveListFiles2.patch
          against trunk revision 980953.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/657/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/657/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/657/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/657/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12451066/improveListFiles2.patch against trunk revision 980953. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/657/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/657/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/657/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/657/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          Most of "improveListFiles2.patch" copies the change made in FileContext into FileSystem. Since the change is pretty trivial and FileSystem is an interface to be deprecated soon, so I just committed it.

          Show
          Hairong Kuang added a comment - Most of "improveListFiles2.patch" copies the change made in FileContext into FileSystem. Since the change is pretty trivial and FileSystem is an interface to be deprecated soon, so I just committed it.

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development