Hadoop Common
  1. Hadoop Common
  2. HADOOP-6870

Add FileSystem#listLocatedStatus to list a directory's content together with each file's block locations

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This jira implements the new FileSystem API as proposed in HDFS-202. The new API aims to eliminate individual "getFileBlockLocations" calls to NN for each file in the input directory of a job. Instead, a file's block locations are returned together with FileStatus when listing a directory, thus improving getSplits performance.

      1. listFiles4.patch
        22 kB
        Hairong Kuang
      2. listFiles3.patch
        21 kB
        Hairong Kuang
      3. listFiles2.patch
        19 kB
        Hairong Kuang
      4. listFiles1.patch
        19 kB
        Hairong Kuang
      5. listFiles.patch
        17 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #370 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/370/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #370 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/370/ )
          Hide
          Suresh Srinivas added a comment -

          Sorry for posting the comments late. I was busy.

          1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS.
          2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it?
          3. FileSystem.java
            • listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false.
            • listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown.
          4. TestListFiles.java
            • testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test.
          Show
          Suresh Srinivas added a comment - Sorry for posting the comments late. I was busy. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? FileSystem.java listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. TestListFiles.java testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test.
          Hide
          Hairong Kuang added a comment -

          I've committed this!

          Show
          Hairong Kuang added a comment - I've committed this!
          Hide
          Doug Cutting added a comment -

          Looks fine to me now. Thanks!

          Show
          Doug Cutting added a comment - Looks fine to me now. Thanks!
          Hide
          Hairong Kuang added a comment -

          FileStatus uses a very loose equality semantics. It compares only path names. So I decided that LocatedFileStatus uses the same semantics.

          Show
          Hairong Kuang added a comment - FileStatus uses a very loose equality semantics. It compares only path names. So I decided that LocatedFileStatus uses the same semantics.
          Hide
          Dmytro Molkov added a comment -

          I looked over the patch and overall looks good to me.
          The only question I had is the hash/equals of the LocatedFileStatus. Do you ever see a need to have two instances of LocatedFileStatus that differ only in block locations be not equal? Because right now the equality is the same as the FileStatus equality and doesn't take block locations into account, right?

          Show
          Dmytro Molkov added a comment - I looked over the patch and overall looks good to me. The only question I had is the hash/equals of the LocatedFileStatus. Do you ever see a need to have two instances of LocatedFileStatus that differ only in block locations be not equal? Because right now the equality is the same as the FileStatus equality and doesn't take block locations into account, right?
          Hide
          Hairong Kuang added a comment -

          All six javadoc warning are related to security and not introduced by this patch.

          Show
          Hairong Kuang added a comment - All six javadoc warning are related to security and not introduced by this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12450567/listFiles4.patch
          against trunk revision 979485.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 10 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/639/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/639/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/639/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/639/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450567/listFiles4.patch against trunk revision 979485. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/639/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/639/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/639/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/639/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          "listFiles4.patch" fixed javadoc warnings & the failed unit test.

          Show
          Hairong Kuang added a comment - "listFiles4.patch" fixed javadoc warnings & the failed unit test.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12450541/listFiles3.patch
          against trunk revision 979485.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/638/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/638/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/638/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/638/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450541/listFiles3.patch against trunk revision 979485. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/638/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/638/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/638/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/638/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          Doug, can not believe that you did not miss any exception! listFiles3.patch addressed your concern on getSymLink.

          Show
          Hairong Kuang added a comment - Doug, can not believe that you did not miss any exception! listFiles3.patch addressed your concern on getSymLink.
          Hide
          Doug Cutting added a comment -

          LocatedFileStatus() still ignores an exception. Even if it should never be thrown, throwing a RuntimeException is probably better than failing silently.

          Show
          Doug Cutting added a comment - LocatedFileStatus() still ignores an exception. Even if it should never be thrown, throwing a RuntimeException is probably better than failing silently.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12450529/listFiles2.patch
          against trunk revision 979387.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/635/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/635/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/635/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/635/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450529/listFiles2.patch against trunk revision 979387. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/635/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/635/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/635/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/635/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          "listFiles2.patch" makes FileSystem to throw a RuntimeException when getBlockLocations throws an exception.

          Show
          Hairong Kuang added a comment - "listFiles2.patch" makes FileSystem to throw a RuntimeException when getBlockLocations throws an exception.
          Hide
          Doug Cutting added a comment -

          The next() method in the iterator for FileSystem#listStatus() still returns a LocatedFileStatus when getFileBlockLocations() throws an exception. I think throwing an exception here would be better. The list() method in that class also still swallows exceptions that should probably be thrown.

          > Unit tests for recursive listing of symbolic links will be added after HADOOP-6872 is fixed.

          Can't we test this independently? There are other symlink-based unit tests independent of HDFS.

          Show
          Doug Cutting added a comment - The next() method in the iterator for FileSystem#listStatus() still returns a LocatedFileStatus when getFileBlockLocations() throws an exception. I think throwing an exception here would be better. The list() method in that class also still swallows exceptions that should probably be thrown. > Unit tests for recursive listing of symbolic links will be added after HADOOP-6872 is fixed. Can't we test this independently? There are other symlink-based unit tests independent of HDFS.
          Hide
          Hairong Kuang added a comment -

          This patch addresses Doug's comments.

          Unit tests for recursive listing of symbolic links will be added after HADOOP-6872 is fixed.

          Show
          Hairong Kuang added a comment - This patch addresses Doug's comments. Unit tests for recursive listing of symbolic links will be added after HADOOP-6872 is fixed.
          Hide
          Doug Cutting added a comment -

          Errors in iteration should turn into a subclass of RuntimeException, not silently fail, no?

          Also, it looks to me that recursive file listings across symlinks don't work. Instinctively, it seems to me they should.

          Show
          Doug Cutting added a comment - Errors in iteration should turn into a subclass of RuntimeException, not silently fail, no? Also, it looks to me that recursive file listings across symlinks don't work. Instinctively, it seems to me they should.
          Hide
          Hairong Kuang added a comment -

          An initial patch for review.

          Show
          Hairong Kuang added a comment - An initial patch for review.
          Hide
          Hairong Kuang added a comment -

          Note that listLocatedStatus lists only files. So the API name might be better to be listFiles().

          Iterator<LocatedFileStatus> listFiles(Path path, boolean isRecursive);
          
          Show
          Hairong Kuang added a comment - Note that listLocatedStatus lists only files. So the API name might be better to be listFiles(). Iterator<LocatedFileStatus> listFiles(Path path, boolean isRecursive);

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development