Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-538

DistributedFileSystem::listStatus incorrectly returns null for empty result sets

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.20.2
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      FileSystem.listStatus() previously returned null for empty or nonexistent directories; will now return empty array for empty directories and throw FileNotFoundException for non-existent directory. Client code should be updated for new semantics.

      Description

      Currently the listStatus method returns null if no files match the request. This differs from the Checksum/LocalFileSystem implementation, which returns an empty array, and the nontvery-explict prescription of the FileSystem interface: "@return the statuses of the files/directories in the given patch" It's better to return an empty collection than have to add extra null checks. The method should return an empty array.

      1. HDFS-538.patch
        2 kB
        Jakob Homan

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Ah, I didn't know there was a prior release note - looking at the history, Rob seems to have deleted the old one on 10/09 without putting in a new one. The old release note also did not explain that listStatus now returns empty arrays for empty directories instead of null.

          Show
          Todd Lipcon added a comment - Ah, I didn't know there was a prior release note - looking at the history, Rob seems to have deleted the old one on 10/09 without putting in a new one. The old release note also did not explain that listStatus now returns empty arrays for empty directories instead of null.
          Hide
          Jakob Homan added a comment -

          @Todd - why the change in release note? The previous version better covered the change and provided a clue to clients at the change they need to make.

          Show
          Jakob Homan added a comment - @Todd - why the change in release note? The previous version better covered the change and provided a clue to clients at the change they need to make.
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #65 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/65/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #65 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/65/ )
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The jar dependence is hard to deal with after the project split. This issue has updated all the jars except for hadoop-mapred-examples-0.21.0-dev.jar, which was changed by MAPREDUCE-874. TestServiceLevelAuthorization is failing now. I will update hadoop-mapred-examples-0.21.0-dev.jar in HDFS-568.

          Show
          Tsz Wo Nicholas Sze added a comment - The jar dependence is hard to deal with after the project split. This issue has updated all the jars except for hadoop-mapred-examples-0.21.0-dev.jar, which was changed by MAPREDUCE-874 . TestServiceLevelAuthorization is failing now. I will update hadoop-mapred-examples-0.21.0-dev.jar in HDFS-568 .
          Hide
          Jakob Homan added a comment -

          Adding correct release note.

          Show
          Jakob Homan added a comment - Adding correct release note.
          Hide
          Chris Douglas added a comment -

          +1

          I committed this. Thanks, Jakob!

          Show
          Chris Douglas added a comment - +1 I committed this. Thanks, Jakob!
          Hide
          Jakob Homan added a comment -

          Also, unit tests were run locally and all pass, once an updated MapReduce jar is provided for run-hdfs-tests-with-mapreduce.

          Show
          Jakob Homan added a comment - Also, unit tests were run locally and all pass, once an updated MapReduce jar is provided for run-hdfs-tests-with-mapreduce.
          Hide
          Chris Douglas added a comment -

          (after applying HADOOP-6201 )

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
          Show
          Chris Douglas added a comment - (after applying HADOOP-6201 ) [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Jakob Homan added a comment -

          Attaching patch, but can't make it patch available because it depends on a new Common jar from HADOOP-6201. Check the notes there.

          Show
          Jakob Homan added a comment - Attaching patch, but can't make it patch available because it depends on a new Common jar from HADOOP-6201 . Check the notes there.
          Hide
          Jakob Homan added a comment -

          Since I've not heard any counter-lobbying about throwing FileNotFound, I've opened an issue to do this and am blocking this issue until it's resolved.

          Show
          Jakob Homan added a comment - Since I've not heard any counter-lobbying about throwing FileNotFound, I've opened an issue to do this and am blocking this issue until it's resolved.
          Hide
          Sanjay Radia added a comment -

          Kons says:
          >Good point. Shouldn't we be throwing FileNotFoundException as other methods?
          Agreed. it should throw the exception if file is not found, instead of returning a null.

          Show
          Sanjay Radia added a comment - Kons says: >Good point. Shouldn't we be throwing FileNotFoundException as other methods? Agreed. it should throw the exception if file is not found, instead of returning a null.
          Hide
          Jakob Homan added a comment -

          Whoops. Looks like Konstantin and I were responding at the same time. As a more general point, it seems quite preferable for methods not to return null, but either return the object they promised to or throw an exception.

          Show
          Jakob Homan added a comment - Whoops. Looks like Konstantin and I were responding at the same time. As a more general point, it seems quite preferable for methods not to return null, but either return the object they promised to or throw an exception.
          Hide
          Jakob Homan added a comment -

          How do we distinguish between an empty directory and a non-existent file (or directory)?

          The similar method getFileStatus() returns a FileNotFoundException, which seems the most reasonable and expected behavior.

          The API documentation is exceedingly poor for this method: what happens if the passed-in path is not a directory?

          Show
          Jakob Homan added a comment - How do we distinguish between an empty directory and a non-existent file (or directory)? The similar method getFileStatus() returns a FileNotFoundException, which seems the most reasonable and expected behavior. The API documentation is exceedingly poor for this method: what happens if the passed-in path is not a directory?
          Hide
          Konstantin Shvachko added a comment -

          > How do we distinguish between an empty directory and a non-existent file (or directory)?

          Good point. Shouldn't we be throwing FileNotFoundException as other methods? We do not throw right now, but it would makes listStatus consistent with other methods, imo.

          Show
          Konstantin Shvachko added a comment - > How do we distinguish between an empty directory and a non-existent file (or directory)? Good point. Shouldn't we be throwing FileNotFoundException as other methods? We do not throw right now, but it would makes listStatus consistent with other methods, imo.
          Hide
          Tom White added a comment -

          How do we distinguish between an empty directory and a non-existent file (or directory)? In the current implementation, an empty directory returns an empty array, and a non-existent file returns null. I think we should keep this distinction and change the documentation of FileSystem, and the implementation of LocalFileSystem to conform.

          it seems to me to be important to fix the semantics sooner than later.

          I agree. This should be made clear in the release notes.

          Show
          Tom White added a comment - How do we distinguish between an empty directory and a non-existent file (or directory)? In the current implementation, an empty directory returns an empty array, and a non-existent file returns null. I think we should keep this distinction and change the documentation of FileSystem, and the implementation of LocalFileSystem to conform. it seems to me to be important to fix the semantics sooner than later. I agree. This should be made clear in the release notes.
          Hide
          Jakob Homan added a comment -

          Tom-
          I've actually run into an issue with this. Patch is done, but when running it through unit tests, it fails TestHDFSFileSystemContract, which includes a test called testListStatusReturnsNullForNonExistentFile(), which was added back in HADOOP-930. This test is evaluating an element of the contract that seems incorrect, particularly given the FileSystem API documention: "@return the statuses of the files/directories in the given patch," which does not mention returning null. It would be much better in this situation to just return a zero-length array. Otherwise, every call to listStatus has to include an extra null check, rather than just iterating over the array (which still succeeds with a zero-length array) or actually checking for zero files via the length of the array equaling zero.

          This is borne out by listStatus' use in HDFS codebase, where there is only one check for null; the others assume an array will be returned and could potentially NPE at some point.

          The rub is that the S3 FileSystem implementations do return nulls in this situation, so if we fix the contract, these will have an incompatible change introduced. I'm not sure how wide-felt that would be, but it seems to me to be important to fix the semantics sooner than later. The other option would be to change the API specification and continue with non-intuitive behavior, as well as changing HDFS to support it. Thoughts?

          Show
          Jakob Homan added a comment - Tom- I've actually run into an issue with this. Patch is done, but when running it through unit tests, it fails TestHDFSFileSystemContract , which includes a test called testListStatusReturnsNullForNonExistentFile() , which was added back in HADOOP-930 . This test is evaluating an element of the contract that seems incorrect, particularly given the FileSystem API documention: " @return the statuses of the files/directories in the given patch ," which does not mention returning null. It would be much better in this situation to just return a zero-length array. Otherwise, every call to listStatus has to include an extra null check, rather than just iterating over the array (which still succeeds with a zero-length array) or actually checking for zero files via the length of the array equaling zero. This is borne out by listStatus' use in HDFS codebase, where there is only one check for null; the others assume an array will be returned and could potentially NPE at some point. The rub is that the S3 FileSystem implementations do return nulls in this situation, so if we fix the contract, these will have an incompatible change introduced. I'm not sure how wide-felt that would be, but it seems to me to be important to fix the semantics sooner than later. The other option would be to change the API specification and continue with non-intuitive behavior, as well as changing HDFS to support it. Thoughts?
          Hide
          Tom White added a comment -

          +1 to making the interfaces consistent. This difference is a good start; see other differences in HDFS-303.

          FileSystemContractBaseTest is a set of tests for testing the general contract of FileSystem for different filesystem implementations. Currently only HDFS and S3 filsystems are being tested with it - it would be good to test all filesystems with it.

          Show
          Tom White added a comment - +1 to making the interfaces consistent. This difference is a good start; see other differences in HDFS-303 . FileSystemContractBaseTest is a set of tests for testing the general contract of FileSystem for different filesystem implementations. Currently only HDFS and S3 filsystems are being tested with it - it would be good to test all filesystems with it.

            People

            • Assignee:
              Jakob Homan
              Reporter:
              Jakob Homan
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development