Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We should remove FileSystem.isDirectory().

      1. 5045.new.patch
        10 kB
        Suresh Srinivas
      2. 5045.patch
        11 kB
        Suresh Srinivas
      3. 5045.patch
        7 kB
        Suresh Srinivas

        Issue Links

          Activity

          Hide
          Jakob Homan added a comment -

          Re-assigning to Suresh, who is farther along than I.

          Show
          Jakob Homan added a comment - Re-assigning to Suresh, who is farther along than I.
          Hide
          Suresh Srinivas added a comment -

          I am not sure why we deprecated FileSystem.isDirectory(path) method in favor of using FileSystem.getFileStatus(path).isDir(). The method is implemented as in org.apache.hadoop.fs.FileSystem.java as:

          /** True iff the named path is a directory. */
          /** @deprecated Use getFileStatus() instead */ @Deprecated
          public boolean isDirectory(Path f) throws IOException {
            try {
              return getFileStatus(f).isDir();
            } catch (FileNotFoundException e) {
              return false;               // f does not exist
            }
          }
          

          It is convenient in the following cases (there are many instance of this in the code):

            if (fs.isDirectory(path)) {
              // do something
            }
          

          Changing it to use getFileStatus requires catching the exception thrown by the method when the file does not exist as follows:

            boolean directory = false;
            try {
              FileStatus filestatus = fs.getFileStatus(path);
              directory = filestatus.isDir();
            } catch (FileNotFoundException ex) {
              // path does not exist
            }
            
            if (directory) {
              // do something
            }
          

          The above code needs to be added at all the places where isDirectory() is being checked. This functionality is what the deprecated method provides, and it gives the convenience of not repeating the try-catch every where.

          Show
          Suresh Srinivas added a comment - I am not sure why we deprecated FileSystem.isDirectory(path) method in favor of using FileSystem.getFileStatus(path).isDir() . The method is implemented as in org.apache.hadoop.fs.FileSystem.java as: /** True iff the named path is a directory. */ /** @deprecated Use getFileStatus() instead */ @Deprecated public boolean isDirectory(Path f) throws IOException { try { return getFileStatus(f).isDir(); } catch (FileNotFoundException e) { return false; // f does not exist } } It is convenient in the following cases (there are many instance of this in the code): if (fs.isDirectory(path)) { // do something } Changing it to use getFileStatus requires catching the exception thrown by the method when the file does not exist as follows: boolean directory = false; try { FileStatus filestatus = fs.getFileStatus(path); directory = filestatus.isDir(); } catch (FileNotFoundException ex) { // path does not exist } if (directory) { // do something } The above code needs to be added at all the places where isDirectory() is being checked. This functionality is what the deprecated method provides, and it gives the convenience of not repeating the try-catch every where.
          Hide
          Doug Cutting added a comment -

          The motivation for deprecating such Path-based methods was that, in many cases other file operations were made concerning the same path, and namenode load could be minimized if listStatus() or getStatus() were used. So we should not blindly replace all calls with some boilerplate, but rather examine each. For example, if the directory is itself from a directory listing, then a FileStatus could already be available. If we find that there are many calls to isDirectory() that really stand alone, where a path is directly provided by user code, then we could consider de-deprecating this method, and instead adding a comment cautioning developers to try to use listStatus() or getStatus() instead to minimize namenode load.

          Show
          Doug Cutting added a comment - The motivation for deprecating such Path-based methods was that, in many cases other file operations were made concerning the same path, and namenode load could be minimized if listStatus() or getStatus() were used. So we should not blindly replace all calls with some boilerplate, but rather examine each. For example, if the directory is itself from a directory listing, then a FileStatus could already be available. If we find that there are many calls to isDirectory() that really stand alone, where a path is directly provided by user code, then we could consider de-deprecating this method, and instead adding a comment cautioning developers to try to use listStatus() or getStatus() instead to minimize namenode load.
          Hide
          Suresh Srinivas added a comment -

          FileSystem currently has two convenient methods - isDirectory(path) and isFile(path). Both of this can cause some one to write code that creates FileStatus unnecessarily. However previously only isDirectory(path) was deprecated.

          There are many instances in the code that are using both these methods. I think it is a convenient methods that prevents having multiple instances of equivalent implementation. My suggestion is to de-deprecate the method. I have attached a patch to do the same. The patch also modifies code to reuse FileStatus where it is possible.

          Show
          Suresh Srinivas added a comment - FileSystem currently has two convenient methods - isDirectory(path) and isFile(path) . Both of this can cause some one to write code that creates FileStatus unnecessarily. However previously only isDirectory(path) was deprecated. There are many instances in the code that are using both these methods. I think it is a convenient methods that prevents having multiple instances of equivalent implementation. My suggestion is to de-deprecate the method. I have attached a patch to do the same. The patch also modifies code to reuse FileStatus where it is possible.
          Hide
          Doug Cutting added a comment -

          There are some whitespace changes and additions of serialVersionUID in your patch that should not be there.

          The most common case we'd like to fix is where listStatus() is used, then one needlessly calls isDirectory() on each path listed rather than using the status, so the javadoc should mention listStatus() as well as getStatus(). The methods this patch modifies in FileUtil are examples of this. If the src parameter of copy() and copyMerge() were FileStatus rather than Path then a call to getStatus could be avoided. A fix is to add private methods with these new parameters, and have the public methods call getStatus and pass the value to the private methods which then walk the directory tree.

          Show
          Doug Cutting added a comment - There are some whitespace changes and additions of serialVersionUID in your patch that should not be there. The most common case we'd like to fix is where listStatus() is used, then one needlessly calls isDirectory() on each path listed rather than using the status, so the javadoc should mention listStatus() as well as getStatus(). The methods this patch modifies in FileUtil are examples of this. If the src parameter of copy() and copyMerge() were FileStatus rather than Path then a call to getStatus could be avoided. A fix is to add private methods with these new parameters, and have the public methods call getStatus and pass the value to the private methods which then walk the directory tree.
          Hide
          Tsz Wo Nicholas Sze added a comment -
          • We should use fs.getFileStatus(src).isDir(), instead of fs.isDirectory(src) , in the cases that src must exist or the operation must abort. Found these cases in
            • ChecksumFileSystem line 402 and line 499
            • FsShell line 1034 and line 1087
          • FsShell.copyFromStdin(..) should use getFileStatus for checking exist and isDir in one shot.
          • In FsShell line 835, srcFs.isDirectory(dst) looks like a bug since it uses srcFs with dst path.
          Show
          Tsz Wo Nicholas Sze added a comment - We should use fs.getFileStatus(src).isDir(), instead of fs.isDirectory(src) , in the cases that src must exist or the operation must abort. Found these cases in ChecksumFileSystem line 402 and line 499 FsShell line 1034 and line 1087 FsShell.copyFromStdin(..) should use getFileStatus for checking exist and isDir in one shot. In FsShell line 835, srcFs.isDirectory(dst) looks like a bug since it uses srcFs with dst path.
          Hide
          Suresh Srinivas added a comment -

          I have attached new patch to incorporate Doug's comments. I have made use of FileStatus returned by listStatus() and removed white space changes.

          My comments on suggestions from Nicholas:

          • We should use fs.getFileStatus(src).isDir(), instead of fs.isDirectory(src) , in the cases that src must exist or the operation must abort. Found these cases in
            o ChecksumFileSystem line 402 and line 499
            o FsShell line 1034 and line 1087

          > This cannot be done without further changes. Today all these method return false when a file does not exist. The methods will throw FileNotFoundException instead of returning false with this change.

          • FsShell.copyFromStdin(..) should use getFileStatus for checking exist and isDir in one shot.
            > Again in this case the exception thrown would be different which results in unit testcase failure.
          • In FsShell line 835, srcFs.isDirectory(dst) looks like a bug since it uses srcFs with dst path.
            > Looks like becase the operation is rename the expectation is that both the source and the destination file systems are of the same type. The code could have been better orgainized if we had only one filesystem variable in the method instead of two. I have made that change.
          Show
          Suresh Srinivas added a comment - I have attached new patch to incorporate Doug's comments. I have made use of FileStatus returned by listStatus() and removed white space changes. My comments on suggestions from Nicholas: We should use fs.getFileStatus(src).isDir(), instead of fs.isDirectory(src) , in the cases that src must exist or the operation must abort. Found these cases in o ChecksumFileSystem line 402 and line 499 o FsShell line 1034 and line 1087 > This cannot be done without further changes. Today all these method return false when a file does not exist. The methods will throw FileNotFoundException instead of returning false with this change. FsShell.copyFromStdin(..) should use getFileStatus for checking exist and isDir in one shot. > Again in this case the exception thrown would be different which results in unit testcase failure. In FsShell line 835, srcFs.isDirectory(dst) looks like a bug since it uses srcFs with dst path. > Looks like becase the operation is rename the expectation is that both the source and the destination file systems are of the same type. The code could have been better orgainized if we had only one filesystem variable in the method instead of two. I have made that change.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good except there are white space changes.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good except there are white space changes.
          Hide
          Suresh Srinivas added a comment -

          New patch with white space changes removed. Thanks Nicholas for catching it.

          Show
          Suresh Srinivas added a comment - New patch with white space changes removed. Thanks Nicholas for catching it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12398946/5045.new.patch
          against trunk revision 738744.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3772/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3772/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3772/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3772/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12398946/5045.new.patch against trunk revision 738744. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3772/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3772/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3772/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3772/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Edited summary: "Remove deprecated FileSystem.isDirectory()" => "FileSystem.isDirectory() should not be deprecated."

          Show
          Tsz Wo Nicholas Sze added a comment - Edited summary: "Remove deprecated FileSystem.isDirectory()" => "FileSystem.isDirectory() should not be deprecated."
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I just committed this. Thanks, Suresh!

          Show
          Tsz Wo Nicholas Sze added a comment - I just committed this. Thanks, Suresh!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development