Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      HDFS-245. Adds a symlink implementation to HDFS. This complements the new symlink feature added in HADOOP-6421
      Show
      HDFS-245 . Adds a symlink implementation to HDFS. This complements the new symlink feature added in HADOOP-6421

      Description

      HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

      1. symLink1.patch
        5 kB
        dhruba borthakur
      2. symLink1.patch
        44 kB
        dhruba borthakur
      3. symLink4.patch
        157 kB
        dhruba borthakur
      4. symLink5.patch
        162 kB
        dhruba borthakur
      5. symLink6.patch
        158 kB
        dhruba borthakur
      6. symLink8.patch
        155 kB
        dhruba borthakur
      7. symLink9.patch
        161 kB
        dhruba borthakur
      8. HADOOP-4044-strawman.patch
        4 kB
        Doug Cutting
      9. symLink11.patch
        185 kB
        dhruba borthakur
      10. 4044_20081030spi.java
        6 kB
        Tsz Wo Nicholas Sze
      11. symLink12.patch
        199 kB
        dhruba borthakur
      12. symLink13.patch
        200 kB
        dhruba borthakur
      13. symLink14.patch
        200 kB
        dhruba borthakur
      14. symLink15.txt
        213 kB
        dhruba borthakur
      15. symLink15.txt
        213 kB
        dhruba borthakur
      16. symlink-0.20.0.patch
        155 kB
        weimin zhu
      17. symlink16-common.patch
        110 kB
        Eli Collins
      18. symlink16-hdfs.patch
        138 kB
        Eli Collins
      19. symlink16-mr.patch
        12 kB
        Eli Collins
      20. designdocv1.txt
        7 kB
        Eli Collins
      21. symlink17-common.txt
        47 kB
        dhruba borthakur
      22. symlink17-hdfs.txt
        71 kB
        dhruba borthakur
      23. symlink18-common.txt
        24 kB
        dhruba borthakur
      24. symlink19-common.txt
        24 kB
        dhruba borthakur
      25. symlink19-common.txt
        25 kB
        dhruba borthakur
      26. symlink19-hdfs.txt
        66 kB
        dhruba borthakur
      27. designdocv2.txt
        12 kB
        Eli Collins
      28. symlink19-common-delta.patch
        6 kB
        Eli Collins
      29. symlink19-hdfs-delta.patch
        45 kB
        Eli Collins
      30. symlink20-hdfs.patch
        96 kB
        Eli Collins
      31. symlink20-common.patch
        31 kB
        Eli Collins
      32. symlink21-common.patch
        31 kB
        Eli Collins
      33. symlink21-hdfs.patch
        96 kB
        Eli Collins
      34. symlink22-common.patch
        31 kB
        Eli Collins
      35. symlink22-hdfs.patch
        100 kB
        Eli Collins
      36. symlink23-hdfs.patch
        101 kB
        Eli Collins
      37. symlink23-common.patch
        34 kB
        Eli Collins
      38. symlink24-hdfs.patch
        103 kB
        Eli Collins
      39. symlink-25-hdfs.patch
        107 kB
        Eli Collins
      40. symlink-26-hdfs.patch
        107 kB
        Eli Collins
      41. symlink-26-hdfs.patch
        108 kB
        Eli Collins
      42. symlink27-hdfs.patch
        113 kB
        Eli Collins
      43. designdocv3.txt
        13 kB
        Eli Collins
      44. symlink28-hdfs.patch
        117 kB
        Eli Collins
      45. symlink29-hdfs.patch
        117 kB
        Eli Collins
      46. symlink29-hdfs.patch
        117 kB
        Eli Collins
      47. symlink30-hdfs.patch
        117 kB
        Eli Collins
      48. symlink31-hdfs.patch
        97 kB
        Eli Collins
      49. symlink33-hdfs.patch
        268 kB
        Eli Collins
      50. symlink35-hdfs.patch
        176 kB
        Eli Collins
      51. symlink36-hdfs.patch
        178 kB
        Eli Collins
      52. symlink37-hdfs.patch
        178 kB
        Eli Collins
      53. design-doc-v4.txt
        9 kB
        Eli Collins
      54. symlink38-hdfs.patch
        181 kB
        Eli Collins
      55. symlink39-hdfs.patch
        181 kB
        Eli Collins
      56. symlink40-hdfs.patch
        180 kB
        Eli Collins
      57. symlink41-hdfs.patch
        182 kB
        Eli Collins

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #302 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/302/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #302 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/302/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/ )
          Hide
          Sanjay Radia added a comment -

          +1
          Committed the patch.
          Thanks Eli

          Show
          Sanjay Radia added a comment - +1 Committed the patch. Thanks Eli
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #201 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/201/)
          . Adds a symlink implementation to HDFS. This complements the new symlink feature added in HADOOP-6421
          (Eli Collins via Sanjay Radia)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #201 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/201/ ) . Adds a symlink implementation to HDFS. This complements the new symlink feature added in HADOOP-6421 (Eli Collins via Sanjay Radia)
          Hide
          Eli Collins added a comment -

          The contrib failure is the same one mentioned previously, failure to download cactus.

          Show
          Eli Collins added a comment - The contrib failure is the same one mentioned previously, failure to download cactus.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12436786/symlink41-hdfs.patch
          against trunk revision 915089.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/243/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/243/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/243/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/243/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436786/symlink41-hdfs.patch against trunk revision 915089. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/243/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/243/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/243/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/243/console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Hey Sanjay,

          The attached patch addresses your feedback and merges with trunk. I had to resolve a bunch of diffs with and add new code for HDFS-946 which introduced HdfsFileStatus so would be good to check out those changes in particular.

          Details:

          • Added missing throws of UnresolvedLinkException to DFSClient
          • Removed UnresolvedLinkException from NameNode
          • Fixed Lease#findPath per your suggestion, did the same to getCompleteBlocksTotal since it should also not throw an UnresolvedLinkException
          • Filed HADOOP-6585 and HDFS-995 to do the new FsStatus APIs and convert existing code to the new interfaces.
          • FSNameSystem#verifyParentDir now passes true for resolveLink, tests are in the base class so will be in an upcoming common change.
          • Replaced the numBlocks "> -2" check with "== -1"
          • When through whitespace changes, they should just remove trailing whitespaces now.
          • Updated HDFS-995 with your suggestions for checking #isDirectory()
          • I noticed INode#constructPath was using StringBuffer, changed it to use StringBuilder
          • FSNamesystem#metaSave no longers throws UnresolvedSymlinkException. It just use java.io.File for the local file so symlinks should be handled transparently by the local file system.
          • Also, for those following along, we should not have to worry about UnresolvedLinkExceptions in loadFSImage and friends because intermediate symlinks are resolved when objects are created, so the paths that get persisted are the fully resolved ones, ie creating /link/file will persist /dir/file so when we load the image we will call addToParent(.."/dir/file"..). This should be more clear when volume relative symlinks are resolved internally (HDFS-932). The code asserts on these cases.
          • getLinkTarget never throws an UnresolvedLinkException, it always resolves the first symlink in the path and ignores the remainder, so I left the throws clause as is

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Sanjay, The attached patch addresses your feedback and merges with trunk. I had to resolve a bunch of diffs with and add new code for HDFS-946 which introduced HdfsFileStatus so would be good to check out those changes in particular. Details: Added missing throws of UnresolvedLinkException to DFSClient Removed UnresolvedLinkException from NameNode Fixed Lease#findPath per your suggestion, did the same to getCompleteBlocksTotal since it should also not throw an UnresolvedLinkException Filed HADOOP-6585 and HDFS-995 to do the new FsStatus APIs and convert existing code to the new interfaces. FSNameSystem#verifyParentDir now passes true for resolveLink, tests are in the base class so will be in an upcoming common change. Replaced the numBlocks "> -2" check with "== -1" When through whitespace changes, they should just remove trailing whitespaces now. Updated HDFS-995 with your suggestions for checking #isDirectory() I noticed INode#constructPath was using StringBuffer, changed it to use StringBuilder FSNamesystem#metaSave no longers throws UnresolvedSymlinkException. It just use java.io.File for the local file so symlinks should be handled transparently by the local file system. Also, for those following along, we should not have to worry about UnresolvedLinkExceptions in loadFSImage and friends because intermediate symlinks are resolved when objects are created, so the paths that get persisted are the fully resolved ones, ie creating /link/file will persist /dir/file so when we load the image we will call addToParent(.."/dir/file"..). This should be more clear when volume relative symlinks are resolved internally ( HDFS-932 ). The code asserts on these cases. getLinkTarget never throws an UnresolvedLinkException, it always resolves the first symlink in the path and ignores the remainder, so I left the throws clause as is Thanks, Eli
          Hide
          Sanjay Radia added a comment -

          @sanjay
          > You forgot to add the exceptions to FSNamesystem. ...
          Sorry my mistake here; I must have checked it in the wrong eclipse project with your previous patch.

          Show
          Sanjay Radia added a comment - @sanjay > You forgot to add the exceptions to FSNamesystem. ... Sorry my mistake here; I must have checked it in the wrong eclipse project with your previous patch.
          Hide
          Sanjay Radia added a comment -

          Thanks for addressing the issues mentioned in the comments.
          I caught a few things you changed since patch 31.
          (you have already responded to some of these in my clarifying email questions - please document your responses in this jira.).
          Most of the issues below can be addressed in separate jiras.

          • Unecesssary changes in white space
            There are approx 10 accidental, unnecessary white spaces changes that you made. I will send you the list via email to keep this comment small.
          • UnresolvedLinkException: - thanks for fixing them - you however missed a few - most of these can be fixed in a separate Jira.
            • getLinkTarget in DFSClient and Hdfs
            • the different variants of create in DFSClient - for consistency.
            • ClientProtocol#saveNamespace - should not throw the exception (no path parameter)
            • FSNamesystem#metaSave should convert the UnresolvedSymlink exception into another argument (IllegalArg?)
              • We don't allow one to save the meta data into a remote file system
              • However we could allow traversing of local symlinks - but current implementation processes these on client side. What is the best way to deal with local symlinks for this operation? (For now, we should just document that this operation does not for local symlinks.)
            • You forgot to add the exceptions to FSNamesystem. Best done in a separate Jira.
            • Lease#findPath thorws an UnresolvedException
              • this should not occur.
              • the method should catch it and log an error/assertion
          • After catching isDir() incompatibility issue in HADOOP-6421, I realized that we should, in a separate jira,
            add a INode#isFile() and FSDirectory#isFile(src) and cleanup the calls to !isDir() and !isDirectory()
            Please add these checks as part of that Jira:
            • FSImage#loadFilesUnderConstruction
              if (old.isDirectory()) ...
              add an additional check for isSymlink() – or better !isFile()
            • FSNamesystem#startFileInternal
              } else if (myFile,isDirectory() {
                        ....
              } else if (myFile.isSymlink() {
                          assert(false, cannot reach here ) <-- add
              }
              
          • Minor code improvements for the quota changes you added since last patch:
            • the ">-2" should be "== -1" just like you did "== -2" for symlink everywhere else
                      // get quota only when the node is a directory
                      long nsQuota = -1L;
                      if (imgVersion <= -16 && blocks == null && numBlocks > -2) {
                        nsQuota = in.readLong();
                      }
                      long dsQuota = -1L;
                      if (imgVersion <= -18 && blocks == null && numBlocks > -2) {
                        dsQuota = in.readLong();
                      }
              
          • Bug in verifyParentDir (a change from patch 31 I last reviewed)
            verifyParentDir calls
            INode[] pathINodes = dir.getExistingPathINodes(parent.toString(), false);
            Should the followLink be true?
            Add tests in your tests jira to cover this one
            • Create, createSymlink, mkidr with arg createParent=false and a symlink in the parent path.
          Show
          Sanjay Radia added a comment - Thanks for addressing the issues mentioned in the comments. I caught a few things you changed since patch 31. (you have already responded to some of these in my clarifying email questions - please document your responses in this jira.). Most of the issues below can be addressed in separate jiras. Unecesssary changes in white space There are approx 10 accidental, unnecessary white spaces changes that you made. I will send you the list via email to keep this comment small. UnresolvedLinkException: - thanks for fixing them - you however missed a few - most of these can be fixed in a separate Jira. getLinkTarget in DFSClient and Hdfs the different variants of create in DFSClient - for consistency. ClientProtocol#saveNamespace - should not throw the exception (no path parameter) FSNamesystem#metaSave should convert the UnresolvedSymlink exception into another argument (IllegalArg?) We don't allow one to save the meta data into a remote file system However we could allow traversing of local symlinks - but current implementation processes these on client side. What is the best way to deal with local symlinks for this operation? (For now, we should just document that this operation does not for local symlinks.) You forgot to add the exceptions to FSNamesystem. Best done in a separate Jira. Lease#findPath thorws an UnresolvedException this should not occur. the method should catch it and log an error/assertion After catching isDir() incompatibility issue in HADOOP-6421 , I realized that we should, in a separate jira, add a INode#isFile() and FSDirectory#isFile(src) and cleanup the calls to !isDir() and !isDirectory() Please add these checks as part of that Jira: FSImage#loadFilesUnderConstruction if (old.isDirectory()) ... add an additional check for isSymlink() – or better !isFile() FSNamesystem#startFileInternal } else if (myFile,isDirectory() { .... } else if (myFile.isSymlink() { assert ( false , cannot reach here ) <-- add } Minor code improvements for the quota changes you added since last patch: the ">-2" should be "== -1" just like you did "== -2" for symlink everywhere else // get quota only when the node is a directory long nsQuota = -1L; if (imgVersion <= -16 && blocks == null && numBlocks > -2) { nsQuota = in.readLong(); } long dsQuota = -1L; if (imgVersion <= -18 && blocks == null && numBlocks > -2) { dsQuota = in.readLong(); } Bug in verifyParentDir (a change from patch 31 I last reviewed) verifyParentDir calls INode[] pathINodes = dir.getExistingPathINodes(parent.toString(), false); Should the followLink be true? Add tests in your tests jira to cover this one Create, createSymlink, mkidr with arg createParent=false and a symlink in the parent path.
          Hide
          Eli Collins added a comment -

          The core test failure TestBlockReport looks like HDFS-733, passes for me locally. The contrib test failure is hudson failing to download tomcat.

          [exec] BUILD FAILED
          [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build.xml:569: The following error occurred while executing this line:
          [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/build.xml:48: The following error occurred while executing this line:
          [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/hdfsproxy/build.xml:292: org.codehaus.cargo.container.ContainerException: Failed to download http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip

          Show
          Eli Collins added a comment - The core test failure TestBlockReport looks like HDFS-733 , passes for me locally. The contrib test failure is hudson failing to download tomcat. [exec] BUILD FAILED [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build.xml:569: The following error occurred while executing this line: [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/build.xml:48: The following error occurred while executing this line: [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/hdfsproxy/build.xml:292: org.codehaus.cargo.container.ContainerException: Failed to download http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12436052/symlink40-hdfs.patch
          against trunk revision 911210.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436052/symlink40-hdfs.patch against trunk revision 911210. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Test failures are due to HDFS-981, HDFS-982, and errors reading partially download zip files.

          Show
          Eli Collins added a comment - Test failures are due to HDFS-981 , HDFS-982 , and errors reading partially download zip files.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12436052/symlink40-hdfs.patch
          against trunk revision 910760.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 117 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436052/symlink40-hdfs.patch against trunk revision 910760. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 117 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Patch attached. Removes testSetTimes since it was moved to common but otherwise the same.

          Show
          Eli Collins added a comment - Patch attached. Removes testSetTimes since it was moved to common but otherwise the same.
          Hide
          Eli Collins added a comment -

          Attached patch merges with trunk.

          Show
          Eli Collins added a comment - Attached patch merges with trunk.
          Hide
          Eli Collins added a comment -

          Attached patch merges with trunk to resolve with the DFSClient refactoring.

          Show
          Eli Collins added a comment - Attached patch merges with trunk to resolve with the DFSClient refactoring.
          Hide
          Eli Collins added a comment -

          Attached the latest design doc that captures the final path resolution and API changes, as well as incorporates feedback from Hairong.

          Show
          Eli Collins added a comment - Attached the latest design doc that captures the final path resolution and API changes, as well as incorporates feedback from Hairong.
          Hide
          Eli Collins added a comment -

          Hey Sanjay,

          Thanks for posting, these are all addressed in patches 34-36, modulo using FileContextTestHelper which I'll defer to when HADOOP-5916 is resolved. Attached a patch that's a minor update to the last one to fix a bug in ImageLoaderCurrent.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Sanjay, Thanks for posting, these are all addressed in patches 34-36, modulo using FileContextTestHelper which I'll defer to when HADOOP-5916 is resolved. Attached a patch that's a minor update to the last one to fix a bug in ImageLoaderCurrent. Thanks, Eli
          Hide
          Sanjay Radia added a comment -

          Feedback on patch 31: TestFcHdfsSymlink

          • dfsClient is unused.
          • FileContext fc = FileContext.getFileContext(fs.getUri());
            should be
            FileContext fc = FileContext.getFileContext(cluster.getUri());
            The FileSystem interface should not be used all all in any of the FileContext tests.
          • Teardown
            move the deletes for base dir to teardown - easier to add new tests
          • testDiffFileContextRelTarget() why set wd?
          • testAccessLinkFromFileSystem() - this should really be FromAbsFileSystem (or FromAFS).
          • testSetPermission
            Also add a few additional lines to test set-permission on link to a dir
          • Add test for quota - setQuota should not follow symlink.
          Show
          Sanjay Radia added a comment - Feedback on patch 31: TestFcHdfsSymlink dfsClient is unused. FileContext fc = FileContext.getFileContext(fs.getUri()); should be FileContext fc = FileContext.getFileContext(cluster.getUri()); The FileSystem interface should not be used all all in any of the FileContext tests. Teardown move the deletes for base dir to teardown - easier to add new tests testDiffFileContextRelTarget() why set wd? testAccessLinkFromFileSystem() - this should really be FromAbsFileSystem (or FromAFS). testSetPermission Also add a few additional lines to test set-permission on link to a dir Add test for quota - setQuota should not follow symlink.
          Hide
          Eli Collins added a comment -

          Hey Hairong, thanks for taking a look!

          1. INodeSymlink extends INodeFile in your patch. Should it extend INode?

          The current code extends INodeFile because logically a symlink is a file (that just contains a path), and it makes the change smaller (by avoiding overriding spaceConsumedInTree, collectSubtreeBlocksAndClear, computeContentSummary, and allowing all the code that handles INodeFile to cover files and symlinks). I like your suggestion though; explicitly differentiating between file and symlink inodes seems less error prone, and there's a minor win in that the code no longer has to interpret and serialize the INodeFile fields that are unused for symlinks. I made INodeSymlink extend INode and modified FSImage#loadFSImage and saveINode2Image, and ImageLoaderCurrent accordingly. Leveraged the existing method of using a negative block size value to indicate the inode type. I also had to add a couple additional checks INode#isLink in places where we were leveraging the fact that symlinks were files.

          2. Should FSDirectory#mkdirs, rootDir.getExistingPathINodes(components, inodes, true) be rootDir.getExistingPathINodes(components, inodes, false)? Mkdirs does not need to resolve the symlink of the last component.

          Yup, there's no need for mkdir to try resolve the final link. I also added throws UnresolvedLinkException which was missing from the javadoc.

          3. Your document does not discuss the symlink resolution of concat, setQuoto, and getContentSummary etc. Could you please add them?

          I filed HDFS-944 for FileContext and AFS to provide the full ClientProtocol interface, currently they're not reachable via FileContext or AFS. Since symlink resolution is partly implemented in FileContext clients must be ported to FileContet to use them, ie if you create a symlink via FileContext then access it via DistributedFileSystem#setQuota you'll get an UnresolvedLinkException. Symlinks won't be exposed to users until we've moved off FileSystem (HADOOP-6446). We first need to port (or rather have FileContext versions of) all the tests that haven't been covered (HDFS-876 and HDFS-934).

          I will update the design doc on the jira tomorrow to reflect the documentation I've put in the jira comments, and the following.

          Patch attached here and to HADOOP-6421, addresses the above and some additional feedback from Sanjay.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Hairong, thanks for taking a look! 1. INodeSymlink extends INodeFile in your patch. Should it extend INode? The current code extends INodeFile because logically a symlink is a file (that just contains a path), and it makes the change smaller (by avoiding overriding spaceConsumedInTree, collectSubtreeBlocksAndClear, computeContentSummary, and allowing all the code that handles INodeFile to cover files and symlinks). I like your suggestion though; explicitly differentiating between file and symlink inodes seems less error prone, and there's a minor win in that the code no longer has to interpret and serialize the INodeFile fields that are unused for symlinks. I made INodeSymlink extend INode and modified FSImage#loadFSImage and saveINode2Image, and ImageLoaderCurrent accordingly. Leveraged the existing method of using a negative block size value to indicate the inode type. I also had to add a couple additional checks INode#isLink in places where we were leveraging the fact that symlinks were files. 2. Should FSDirectory#mkdirs, rootDir.getExistingPathINodes(components, inodes, true) be rootDir.getExistingPathINodes(components, inodes, false)? Mkdirs does not need to resolve the symlink of the last component. Yup, there's no need for mkdir to try resolve the final link. I also added throws UnresolvedLinkException which was missing from the javadoc. 3. Your document does not discuss the symlink resolution of concat, setQuoto, and getContentSummary etc. Could you please add them? I filed HDFS-944 for FileContext and AFS to provide the full ClientProtocol interface, currently they're not reachable via FileContext or AFS. Since symlink resolution is partly implemented in FileContext clients must be ported to FileContet to use them, ie if you create a symlink via FileContext then access it via DistributedFileSystem#setQuota you'll get an UnresolvedLinkException. Symlinks won't be exposed to users until we've moved off FileSystem ( HADOOP-6446 ). We first need to port (or rather have FileContext versions of) all the tests that haven't been covered ( HDFS-876 and HDFS-934 ). I will update the design doc on the jira tomorrow to reflect the documentation I've put in the jira comments, and the following. Patch attached here and to HADOOP-6421 , addresses the above and some additional feedback from Sanjay. Thanks, Eli
          Hide
          Hairong Kuang added a comment -

          Eli, great work! A couple of more comments:
          1. INodeSymlink extends INodeFile in your patch. Should it extend INode?
          2. Should FSDirectory#mkdirs, rootDir.getExistingPathINodes(components, inodes, true) be rootDir.getExistingPathINodes(components, inodes, false)? Mkdirs does not need to resolve the symlink of the last component.
          3. Your document does not discuss the symlink resolution of concat, setQuoto, and getContentSummary etc. Could you please add them?

          Show
          Hairong Kuang added a comment - Eli, great work! A couple of more comments: 1. INodeSymlink extends INodeFile in your patch. Should it extend INode? 2. Should FSDirectory#mkdirs, rootDir.getExistingPathINodes(components, inodes, true) be rootDir.getExistingPathINodes(components, inodes, false)? Mkdirs does not need to resolve the symlink of the last component. 3. Your document does not discuss the symlink resolution of concat, setQuoto, and getContentSummary etc. Could you please add them?
          Hide
          Eli Collins added a comment -

          Hey Sanjay,

          Thanks for the review. Latest patch attached, addresses your feedback and merges in trunk.

          File Jira to resolve dot relative and volume-root relative names locally in HDFS (this Jira can be completed later)

          Filed HDFS-932.

          explicitly declare the unresolvedLinkException in methods that throw them- we are moving towards declaring all exceptions (see HDFS-717). ClientProtocol, Namenode, FSNamesystem, etc

          Done. I had to modify quite a few functions both in hdfs and common since the UnresolvedLinkException flows all the way up to FileContext. I left the tests as is, I also didn't modify FileSystem.

          UnresolvedPathException, suggestion: adding a method called "getUnresolvedPath for completeness.

          Done.

          createSymlinkXXX -don't pass in permission object as parameter - it suggests that permissions matter for symlinks

          Done. I left the permissions as default since INodes do not support null PermissionStatus or PermissionStatus' created with a null FsPermission. Currently links are rwx everyone so access is checked on the link target. And like files accessed to the links themselves should be subject to the containing directory's permissions. I started to modify TestDFSPermission to cover this, which requires converting the test to use FileContext, and just converting it to use FileContext caused some of the tests to fail, which I'm looking into, if you don't mind will address in a follow-on jira. I filed HDFS-876 for porting tests over to FileContext a while back and HDFS-934 to extend those for additional coverage after they've been ported.

          FSDirectory#addToParent - if not a symlink, then pass null instead of a null string.

          The symlink value needs to be a string since we serialize it to image/edit log, ie readString and writeString don't support nulls.

          getExistingPathINodes() typo: javadoc for param resolveLink is missing the param name

          Fixed.

          Add comment that exception is thrown for symlinks occurs in the middle of the path.

          Done.

          The multiple if statements with same or negation of condition - can be confusing and error prone in future

          I rewrote this to use just two if that read like the logic: "If the current node is a symlink then throw an UnresolvedPathException, unconditionally if we are not on the last path component, and conditionally if we are on the last path component and we want to resolve it." Think this is more clear.

          if (curNode.isLink() && (!lastComp || (lastComp && resolveLink))) {
            throw new UnresolvedPathException(...)
          }
          if (lastComp || !curNode.isDirectory()) {
            break;
          }
          

          FSDirectory#getFileBlocks add: if (symlink) return null.

          Fixed. Shouldn't be executed in practice since we always complete a resolved path.

          FSDirectory#getListing() calls getNode(srcs, false) // shouldn't resolveLink be true

          Fixed. FileContext#listStatus was not resolving links, fixed that. I added FileContextSymlinkBaseTest#testListStatusUsingLink to test this case.

          FSDirectory#getPreferredBlockSize calls getNode(filename, false ) // shouldn't resolveLink be true

          Fixed. Looks like DFSClient#getBlockSize is the only caller and isn't covered by any unit tests (you can remove the method and compile hdfs). Filed HDFS-929 to either remove it or add a test.

          I replaced the throw of UnresolvedPathException in FSNameSystem, it wasn't necessary as the call to getFileINode resolves the links. There's now only a single place we throw UnresolvedPathException which was my original intent.

          Latest patch attached here and to HADOOP-6421.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Sanjay, Thanks for the review. Latest patch attached, addresses your feedback and merges in trunk. File Jira to resolve dot relative and volume-root relative names locally in HDFS (this Jira can be completed later) Filed HDFS-932 . explicitly declare the unresolvedLinkException in methods that throw them- we are moving towards declaring all exceptions (see HDFS-717 ). ClientProtocol, Namenode, FSNamesystem, etc Done. I had to modify quite a few functions both in hdfs and common since the UnresolvedLinkException flows all the way up to FileContext. I left the tests as is, I also didn't modify FileSystem. UnresolvedPathException, suggestion: adding a method called "getUnresolvedPath for completeness. Done. createSymlinkXXX -don't pass in permission object as parameter - it suggests that permissions matter for symlinks Done. I left the permissions as default since INodes do not support null PermissionStatus or PermissionStatus' created with a null FsPermission. Currently links are rwx everyone so access is checked on the link target. And like files accessed to the links themselves should be subject to the containing directory's permissions. I started to modify TestDFSPermission to cover this, which requires converting the test to use FileContext, and just converting it to use FileContext caused some of the tests to fail, which I'm looking into, if you don't mind will address in a follow-on jira. I filed HDFS-876 for porting tests over to FileContext a while back and HDFS-934 to extend those for additional coverage after they've been ported. FSDirectory#addToParent - if not a symlink, then pass null instead of a null string. The symlink value needs to be a string since we serialize it to image/edit log, ie readString and writeString don't support nulls. getExistingPathINodes() typo: javadoc for param resolveLink is missing the param name Fixed. Add comment that exception is thrown for symlinks occurs in the middle of the path. Done. The multiple if statements with same or negation of condition - can be confusing and error prone in future I rewrote this to use just two if that read like the logic: "If the current node is a symlink then throw an UnresolvedPathException, unconditionally if we are not on the last path component, and conditionally if we are on the last path component and we want to resolve it." Think this is more clear. if (curNode.isLink() && (!lastComp || (lastComp && resolveLink))) { throw new UnresolvedPathException(...) } if (lastComp || !curNode.isDirectory()) { break ; } FSDirectory#getFileBlocks add: if (symlink) return null. Fixed. Shouldn't be executed in practice since we always complete a resolved path. FSDirectory#getListing() calls getNode(srcs, false) // shouldn't resolveLink be true Fixed. FileContext#listStatus was not resolving links, fixed that. I added FileContextSymlinkBaseTest#testListStatusUsingLink to test this case. FSDirectory#getPreferredBlockSize calls getNode(filename, false ) // shouldn't resolveLink be true Fixed. Looks like DFSClient#getBlockSize is the only caller and isn't covered by any unit tests (you can remove the method and compile hdfs). Filed HDFS-929 to either remove it or add a test. I replaced the throw of UnresolvedPathException in FSNameSystem, it wasn't necessary as the call to getFileINode resolves the links. There's now only a single place we throw UnresolvedPathException which was my original intent. Latest patch attached here and to HADOOP-6421 . Thanks, Eli
          Hide
          Eli Collins added a comment -

          Thanks for the input. There's a test plan attached HDFS-254 that covers specific cases. The ones you mentioned are covered in TestFcHdfsSymlink#testLinkAcrossFileSystems and FileContextSymlinkBaseTest#testRecursiveLinks. There's over 1kloc of tests across 4 or so classes so I won't list them all. Unfortunately it's hard to do end-to-end tests right now (eg distcp) as most of the existing tests, MR, clients (eg FsShell) and some of the file systems implementations still use FileSystem rather than FileContext and AbstractFileSystem. I filed HDFS-765 to track migrating off FileSystem, so we will get additional FileContext (and the new features it has like symlinks) coverage when we start moving more more tests, file system implementations, and clients to FileContext and AFS.

          Show
          Eli Collins added a comment - Thanks for the input. There's a test plan attached HDFS-254 that covers specific cases. The ones you mentioned are covered in TestFcHdfsSymlink#testLinkAcrossFileSystems and FileContextSymlinkBaseTest#testRecursiveLinks. There's over 1kloc of tests across 4 or so classes so I won't list them all. Unfortunately it's hard to do end-to-end tests right now (eg distcp) as most of the existing tests, MR, clients (eg FsShell) and some of the file systems implementations still use FileSystem rather than FileContext and AbstractFileSystem. I filed HDFS-765 to track migrating off FileSystem, so we will get additional FileContext (and the new features it has like symlinks) coverage when we start moving more more tests, file system implementations, and clients to FileContext and AFS.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          You may already have it: I think we need tests for cycles and different file system schemes, e.g.

            hdfs://namenode/foo/link1  (a symlink pointing to s3://hostname/bar)
            s3://hostname/bar/link2  (a symlink pointing to hdfs://namenode/foo)
          

          Then, test various commands like ls, cp, distcp, etc.

          Show
          Tsz Wo Nicholas Sze added a comment - You may already have it: I think we need tests for cycles and different file system schemes, e.g. hdfs://namenode/foo/link1 (a symlink pointing to s3://hostname/bar) s3://hostname/bar/link2 (a symlink pointing to hdfs://namenode/foo) Then, test various commands like ls, cp, distcp, etc.
          Hide
          Sanjay Radia added a comment -
          • File Jira to resolve dot relative and volume-root relative names locally in HDFS (this Jira can be completed later)
          • explicitly declare the unresolvedLinkException in methods that throw them- we are moving towards declaring all exceptions (see HDFS-717)
            • ClientProtocol, Namenode, FSNamesystem, etc
          • UnresolvedPathException
            • suggestion: adding a method called "getUnresolvedPath for completeness.
          • createSymlinkXXX -don't pass in permission object as parameter - it suggests that permissions matter for symlinks
            • let the FSDirectory#addSymlink create the permission object (use a zero or null permission)
          • FSDirectory#addToParent - if not a symlink, then pass null instead of a null string.
          • getExistingPathINodes()
            • typo: javadoc for param resolveLink is missing the param name
            • Add comment that exception is thrown for symlinks occurs in the middle of the path.
            • The multiple if statements with same or negation of condition - can be confusing and error prone in future
              if (curNode.isLink() && (resolveLink || (count < components.length - 1)) {
                      NameNode.stateChangeLog.debug("getExistingPathINodes UnresolvedPathException " +
                                                  ....
                      throw new UnresolvedPathException(constructPath(components, 0), ...);                
              }
              if (count == components.length - 1) {
                      break; // no more children, stop here
              }
              
          • FSDirectory#getFileBlocks add:
            if (symlink) return null.
          • FSDirectory#getListing()
            calls getNode(srcs, false) // shouldn't resolveLink be true
          • FSDirectory#getPreferredBlockSize
            calls getNode(filename, false ) // shouldn't resolveLink be true

          I still need to review the tests.

          Show
          Sanjay Radia added a comment - File Jira to resolve dot relative and volume-root relative names locally in HDFS (this Jira can be completed later) explicitly declare the unresolvedLinkException in methods that throw them- we are moving towards declaring all exceptions (see HDFS-717 ) ClientProtocol, Namenode, FSNamesystem, etc UnresolvedPathException suggestion: adding a method called "getUnresolvedPath for completeness. createSymlinkXXX -don't pass in permission object as parameter - it suggests that permissions matter for symlinks let the FSDirectory#addSymlink create the permission object (use a zero or null permission) FSDirectory#addToParent - if not a symlink, then pass null instead of a null string. getExistingPathINodes() typo: javadoc for param resolveLink is missing the param name Add comment that exception is thrown for symlinks occurs in the middle of the path. The multiple if statements with same or negation of condition - can be confusing and error prone in future if (curNode.isLink() && (resolveLink || (count < components.length - 1)) { NameNode.stateChangeLog.debug( "getExistingPathINodes UnresolvedPathException " + .... throw new UnresolvedPathException(constructPath(components, 0), ...); } if (count == components.length - 1) { break ; // no more children, stop here } FSDirectory#getFileBlocks add: if (symlink) return null. FSDirectory#getListing() calls getNode(srcs, false) // shouldn't resolveLink be true FSDirectory#getPreferredBlockSize calls getNode(filename, false ) // shouldn't resolveLink be true I still need to review the tests.
          Hide
          Eli Collins added a comment -

          Patch attached. Contains the hdfs side of the changes for HADOOP-6421, small reworking of permissions for createSymlink.

          Show
          Eli Collins added a comment - Patch attached. Contains the hdfs side of the changes for HADOOP-6421 , small reworking of permissions for createSymlink.
          Hide
          Eli Collins added a comment -

          Latest patch attached. It addresses feedback from Sanjay's review of the common patch on HADOOP-6421, mostly just pulls the bulk of the symlink tests into a common class.

          Show
          Eli Collins added a comment - Latest patch attached. It addresses feedback from Sanjay's review of the common patch on HADOOP-6421 , mostly just pulls the bulk of the symlink tests into a common class.
          Hide
          Eli Collins added a comment -

          Attached a minor update to the last patch to address javadoc and checkstyle.

          Show
          Eli Collins added a comment - Attached a minor update to the last patch to address javadoc and checkstyle.
          Hide
          Eli Collins added a comment -

          Minor update to the last patch to address Cos's comment in HDFS-254.

          Show
          Eli Collins added a comment - Minor update to the last patch to address Cos's comment in HDFS-254 .
          Hide
          Eli Collins added a comment -

          Here's the latest patch that implements the resolution behavior posted to HDFS-245 that Doug and Sanjay and I discussed. Rebased against trunk and passes test-core.

          Show
          Eli Collins added a comment - Here's the latest patch that implements the resolution behavior posted to HDFS-245 that Doug and Sanjay and I discussed. Rebased against trunk and passes test-core.
          Hide
          Eli Collins added a comment -

          Here's the final link resolution behavior based on discussion in HADOOP-6427. Basically we dropped (for now) links that resolve relative to the default file system of the file context accessing the link. Updated description follows:

          Given a symbolic link of form:
          
          <---X--->
          fs://host/A/B/link   
          <-----Y----->
          
          X is the scheme and authority that identify the file system, and Y is the
          path leading up to the symlink "link". If Y is a symlink itself then
          let Y' be the target of Y and X' be the scheme and authority of Y'.
          
          Then link targets of the following form resolve as follows:
          
          1. Fully qualified URI
          
           fs://hostX/A/B/file   Resolved according to the target file system.
          
          
          2. Partially qualified (scheme but no host) URI
          
           fs:///A/B/file        Resolved according to the target file sytem. Eg
                                 resolving a link to hdfs:///A will result in an 
                                 exception because HDFS URIs must be fully qualified, 
                                 while a link to file:///A will not since Hadoop's 
                                 local file systems require partially qualified URIs.
          
          3. Relative path
          
           path                  Resolves to <Y'><path>. Eg if Y resolves to 
                                 hdfs://host/A and path is "../B/file" then
                                 <Y'><path> is hdfs://host/B/file
          
          4. Absolute path
          
           path                  Resolves to <X'><path>. Eg if Y resolves
                                 hdfs://host/A/B and path is "/file" then
                                 <X'><path> is hdfs://host/file
          
          Show
          Eli Collins added a comment - Here's the final link resolution behavior based on discussion in HADOOP-6427 . Basically we dropped (for now) links that resolve relative to the default file system of the file context accessing the link. Updated description follows: Given a symbolic link of form: <---X---> fs://host/A/B/link <-----Y-----> X is the scheme and authority that identify the file system, and Y is the path leading up to the symlink "link". If Y is a symlink itself then let Y' be the target of Y and X' be the scheme and authority of Y'. Then link targets of the following form resolve as follows: 1. Fully qualified URI fs://hostX/A/B/file Resolved according to the target file system. 2. Partially qualified (scheme but no host) URI fs:///A/B/file Resolved according to the target file sytem. Eg resolving a link to hdfs:///A will result in an exception because HDFS URIs must be fully qualified, while a link to file:///A will not since Hadoop's local file systems require partially qualified URIs. 3. Relative path path Resolves to <Y'><path>. Eg if Y resolves to hdfs://host/A and path is "../B/file" then <Y'><path> is hdfs://host/B/file 4. Absolute path path Resolves to <X'><path>. Eg if Y resolves hdfs://host/A/B and path is "/file" then <X'><path> is hdfs://host/file
          Hide
          Eli Collins added a comment -

          FYI discussion of what the syntax we should use to indicate path resolution uses the client's file system is on HADOOP-6427.

          Show
          Eli Collins added a comment - FYI discussion of what the syntax we should use to indicate path resolution uses the client's file system is on HADOOP-6427 .
          Hide
          Eli Collins added a comment -

          Latest patch.

          • Implements path resolution as discussed in the above comment and HADOOP-6427
          • Additional tests in TestLink to cover the above
          • Resolved against trunk
          • See HADOOP-6421 for the common changes
          Show
          Eli Collins added a comment - Latest patch. Implements path resolution as discussed in the above comment and HADOOP-6427 Additional tests in TestLink to cover the above Resolved against trunk See HADOOP-6421 for the common changes
          Hide
          dhruba borthakur added a comment -

          Your latest proposal looks good.

          Show
          dhruba borthakur added a comment - Your latest proposal looks good.
          Hide
          Doug Cutting added a comment -

          +1 This looks right to me. Links should be resolved against the client's context.

          Show
          Doug Cutting added a comment - +1 This looks right to me. Links should be resolved against the client's context.
          Hide
          Eli Collins added a comment -

          Before I modified that path to take Doug's suggestion from HADOOP-6427 (the behavior for partially qualified targets) I wanted to get a +1 or two on the following symlink resolution behavior.

          Given a symbolic link of the following form:
          
          <----F---->
          hdfs://host/A/B/link   
          <------P------>
          
          If P points to P' then F' is the F part (scheme and host that identifies 
          a file system) of P'.  If P is not a symlink then P' = P and F' = F.
          
          Then link targets of the following type resolve as follows:
          
          1. Fully qualified
          
           <---------T--------->
           hdfs://hostX/A/B/file    <T>  Fully qualified paths are simple.
           
          
          2. Partially qualified (scheme but no host)
          
                   <--A--->
           hdfs:///A/B/file         <C><A> where C is the default file system of
                                    the file context resolving the link, eg if
                                    C = file:/// then <C><A> = file:///A/B/file
                                    Think an NFS mount that links to /tmp.
          
          3. Relative
          
           <-T-> 
           file                     <P'><T> eg if P resolves to hdfs://host1/XXX
                                    then <P'><T> is hdfs://host1/YYY/file
          
          4. Absolute
          
           <---T--->
           /A/B/file                <F'><T> 
                                    eg if P = hdfs://host1/XXX
                                    and  P' = hdfs://host2/YYY
                                    <F'><T> = hdfs://host2/A/B/file
          
          Show
          Eli Collins added a comment - Before I modified that path to take Doug's suggestion from HADOOP-6427 (the behavior for partially qualified targets) I wanted to get a +1 or two on the following symlink resolution behavior. Given a symbolic link of the following form: <----F----> hdfs://host/A/B/link <------P------> If P points to P' then F' is the F part (scheme and host that identifies a file system) of P'. If P is not a symlink then P' = P and F' = F. Then link targets of the following type resolve as follows: 1. Fully qualified <---------T---------> hdfs://hostX/A/B/file <T> Fully qualified paths are simple. 2. Partially qualified (scheme but no host) <--A---> hdfs:///A/B/file <C><A> where C is the default file system of the file context resolving the link, eg if C = file:/// then <C><A> = file:///A/B/file Think an NFS mount that links to /tmp. 3. Relative <-T-> file <P'><T> eg if P resolves to hdfs://host1/XXX then <P'><T> is hdfs://host1/YYY/file 4. Absolute <---T---> /A/B/file <F'><T> eg if P = hdfs://host1/XXX and P' = hdfs://host2/YYY <F'><T> = hdfs://host2/A/B/file
          Hide
          Eli Collins added a comment -

          Updated design doc.

          Show
          Eli Collins added a comment - Updated design doc.
          Hide
          Eli Collins added a comment -

          Latest patch. Updated the implementation to reflect semantics discussed in HADOOP-6427 and requisite tests.

          Show
          Eli Collins added a comment - Latest patch. Updated the implementation to reflect semantics discussed in HADOOP-6427 and requisite tests.
          Hide
          Eli Collins added a comment -

          Latest patch. Jives with the one posted to HADOOP-6421.

          Show
          Eli Collins added a comment - Latest patch. Jives with the one posted to HADOOP-6421 .
          Hide
          Eli Collins added a comment -

          Attached latest patch. Reflects changes necessary for adding symlinks to LocalFs and more tests, eg symlinks across file systems.

          Show
          Eli Collins added a comment - Attached latest patch. Reflects changes necessary for adding symlinks to LocalFs and more tests, eg symlinks across file systems.
          Hide
          Eli Collins added a comment -

          Latest patch attached. Broke out the one huge test function into a bunch of junit4 style individual tests, and added more tests from the test plan.

          Show
          Eli Collins added a comment - Latest patch attached. Broke out the one huge test function into a bunch of junit4 style individual tests, and added more tests from the test plan.
          Hide
          Eli Collins added a comment -

          Filed HADOOP-6421 for the common parts.

          Show
          Eli Collins added a comment - Filed HADOOP-6421 for the common parts.
          Hide
          Eli Collins added a comment -

          Hey Sanjay,

          Thanks for taking a look.

          * Add final to parameters of the new methods you have added.

          Fixed.

          * GetLinkTarget() - should this be public?

          Good catch, made this protected.

          * Resolve() checks isUriPathAbsolute(). Actually you also need to check that the the path is fully qualified (ie has a scheme). So if a Slash-relative path name is returned (a mistake in file system impl) the patch will incorrectly resolve it relative to the FileContext's default filesystem (ie its Slash). The file system should have resolved it internally in the first place. Hence it would be correct (but inefficient) for the client side to send this back to the file system in which the symlink occurred; we should simple report this an error rather then send it back to the file system.

          Good point, createSymlink in FileContext now makes the link target fully qualified. I added tests to TestLink that use fully qualified paths for both the link and the target. I also added an isFullyQualified method to Path (and tests to TestPath) and updated the comment for isAbsolute to match the implementation (which looks correct).

          Note that a slash-relative path being return is currently not a mistake in the file system impl. as the client handles all UnresolvedPathExceptions. Making the NN transparently resolve links when the target is in its namespace is definitely a worthwhile optimization. Need to add more tests first.

          * Why pass the "this" as a parameter to resolve(this, path) - the FileContext is the "this" instance any way.

          Not following, resolve is a method of FSLinkResolver not FileContext.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Sanjay, Thanks for taking a look. * Add final to parameters of the new methods you have added. Fixed. * GetLinkTarget() - should this be public? Good catch, made this protected. * Resolve() checks isUriPathAbsolute(). Actually you also need to check that the the path is fully qualified (ie has a scheme). So if a Slash-relative path name is returned (a mistake in file system impl) the patch will incorrectly resolve it relative to the FileContext's default filesystem (ie its Slash). The file system should have resolved it internally in the first place. Hence it would be correct (but inefficient) for the client side to send this back to the file system in which the symlink occurred; we should simple report this an error rather then send it back to the file system. Good point, createSymlink in FileContext now makes the link target fully qualified. I added tests to TestLink that use fully qualified paths for both the link and the target. I also added an isFullyQualified method to Path (and tests to TestPath) and updated the comment for isAbsolute to match the implementation (which looks correct). Note that a slash-relative path being return is currently not a mistake in the file system impl. as the client handles all UnresolvedPathExceptions. Making the NN transparently resolve links when the target is in its namespace is definitely a worthwhile optimization. Need to add more tests first. * Why pass the "this" as a parameter to resolve(this, path) - the FileContext is the "this" instance any way. Not following, resolve is a method of FSLinkResolver not FileContext. Thanks, Eli
          Hide
          Sanjay Radia added a comment -

          Sorry for my delay in giving feedback on this; I keep on getting pulled to do other things.
          I have not finished the complete review but have the following feedback for the FileContext part of the patch so far:

          • Add final to parameters of the new methods you have added.
          • GetLinkTarget() - should this be public?
          • Resolve() checks isUriPathAbsolute(). Actually you also need to check that the the path is fully qualified (ie has a scheme). So if a Slash-relative path name is returned (a mistake in file system impl) the patch will incorrectly resolve it relative to the FileContext's default filesystem (ie its Slash). The file system should have resolved it internally in the first place. Hence it would be correct (but inefficient) for the client side to send this back to the file system in which the symlink occurred; we should simple report this an error rather then send it back to the file system.
          • Why pass the "this" as a parameter to resolve(this, path) - the FileContext is the "this" instance any way.

          So far the code is looking good and clean. I will send you more comments over the next few days.

          Show
          Sanjay Radia added a comment - Sorry for my delay in giving feedback on this; I keep on getting pulled to do other things. I have not finished the complete review but have the following feedback for the FileContext part of the patch so far: Add final to parameters of the new methods you have added. GetLinkTarget() - should this be public? Resolve() checks isUriPathAbsolute(). Actually you also need to check that the the path is fully qualified (ie has a scheme). So if a Slash-relative path name is returned (a mistake in file system impl) the patch will incorrectly resolve it relative to the FileContext's default filesystem (ie its Slash). The file system should have resolved it internally in the first place. Hence it would be correct (but inefficient) for the client side to send this back to the file system in which the symlink occurred; we should simple report this an error rather then send it back to the file system. Why pass the "this" as a parameter to resolve(this, path) - the FileContext is the "this" instance any way. So far the code is looking good and clean. I will send you more comments over the next few days.
          Hide
          Eli Collins added a comment -

          Uploaded patches per the last comment that incorporate Doug's review and minor fixes like getting the offline image viewer test to work with symlinks. Also verified that hdfs oiv using a fsimage with symlinks look OK.

          Show
          Eli Collins added a comment - Uploaded patches per the last comment that incorporate Doug's review and minor fixes like getting the offline image viewer test to work with symlinks. Also verified that hdfs oiv using a fsimage with symlinks look OK.
          Hide
          Eli Collins added a comment -

          Hey Doug, Thanks for taking a look.

          Looking at the patch: - the javadoc refers to

          UnresolvedPathException when it should read UnresolvedLinkException

          Assume you're referring to FileContext. These shouldn't be there, I
          removed them and updated the corresponding comments in
          AbstractFileSystem to indicate that getFileStatus throws an
          UnresolvedPathException if it encounters a symlink in the given path
          and that getFileLinkStatus throws an UnresolvedPathException if it
          encounters a symlink in the path leading up to the file or symlink
          that the given path refers to.

           - under what conditions would FileContext#getFileStatus or

          getLinkFileStatus throw UnresolvedLinkException?  it looks to me like
          this exception is always handled internally.  if a filesystem doesn't
          support symlinks, or the symlink is broken, then other exceptions are
          thrown.  this exception appears to be used purely for internal control
          purposes and shouldn't be seen by end users so far as i can tell.  its
          visibility should be "limited private", i think.

          You're right, it's always handled internally I inadvertently added it
          there instead of AbstractFileSystem. I moved the comments (above) and
          made UnresolvedLinkException LimitedPrivate for HDFS.

           - shouldn't FilterFS#getFileLinkStatus() call getFileLinkStatus

          instead of getFileStatus?

          Yup, fixed. I also added a comment to getFileLinkStatus ("FileSystem
          does not support symlinks") explaining why it does not call
          fsImpl.getFileLinkStatus.

           - in FSLinkResolver, what's the point of the 'catch (IOException

          e)

          { throw e; }

          '?

          I suspect it was to pass any non-UnresolvedLinkExceptions up to the
          caller, that will happen naturally so it's not needed. I removed it.

          I also forgot to bump the versionID in ClientProtocol, fixed that.

          Also noticed that the new fs image version wasn't being tested in
          TestOfflineImageViewer, will fix that and upload a new patch.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Doug, Thanks for taking a look. Looking at the patch: - the javadoc refers to UnresolvedPathException when it should read UnresolvedLinkException Assume you're referring to FileContext. These shouldn't be there, I removed them and updated the corresponding comments in AbstractFileSystem to indicate that getFileStatus throws an UnresolvedPathException if it encounters a symlink in the given path and that getFileLinkStatus throws an UnresolvedPathException if it encounters a symlink in the path leading up to the file or symlink that the given path refers to.  - under what conditions would FileContext#getFileStatus or getLinkFileStatus throw UnresolvedLinkException?  it looks to me like this exception is always handled internally.  if a filesystem doesn't support symlinks, or the symlink is broken, then other exceptions are thrown.  this exception appears to be used purely for internal control purposes and shouldn't be seen by end users so far as i can tell.  its visibility should be "limited private", i think. You're right, it's always handled internally I inadvertently added it there instead of AbstractFileSystem. I moved the comments (above) and made UnresolvedLinkException LimitedPrivate for HDFS.  - shouldn't FilterFS#getFileLinkStatus() call getFileLinkStatus instead of getFileStatus? Yup, fixed. I also added a comment to getFileLinkStatus ("FileSystem does not support symlinks") explaining why it does not call fsImpl.getFileLinkStatus.  - in FSLinkResolver, what's the point of the 'catch (IOException e) { throw e; } '? I suspect it was to pass any non-UnresolvedLinkExceptions up to the caller, that will happen naturally so it's not needed. I removed it. I also forgot to bump the versionID in ClientProtocol, fixed that. Also noticed that the new fs image version wasn't being tested in TestOfflineImageViewer, will fix that and upload a new patch. Thanks, Eli
          Hide
          dhruba borthakur added a comment -

          > move the symlink implementation off FileSystem and onto AbstractFileSystem, decided not to have a duplicate implementation for FileSystem

          +1 to the above.

          Show
          dhruba borthakur added a comment - > move the symlink implementation off FileSystem and onto AbstractFileSystem, decided not to have a duplicate implementation for FileSystem +1 to the above.
          Hide
          Doug Cutting added a comment -

          > decided not to have a duplicate implementation for FileSystem since clients are moving off it, reasonable?

          +1

          Looking at the patch:

          • the javadoc refers to UnresolvedPathException when it should read UnresolvedLinkException, I think.
          • under what conditions would FileContext#getFileStatus or getLinkFileStatus throw UnresolvedLinkException? it looks to me like this exception is always handled internally. if a filesystem doesn't support symlinks, or the symlink is broken, then other exceptions are thrown. this exception appears to be used purely for internal control purposes and shouldn't be seen by end users so far as i can tell. its visibility should be "limited private", i think.
          • shouldn't FilterFS#getFileLinkStatus() call getFileLinkStatus instead of getFileStatus?
          • in FSLinkResolver, what's the point of the 'catch (IOException e) { throw e; }

            '?

          Show
          Doug Cutting added a comment - > decided not to have a duplicate implementation for FileSystem since clients are moving off it, reasonable? +1 Looking at the patch: the javadoc refers to UnresolvedPathException when it should read UnresolvedLinkException, I think. under what conditions would FileContext#getFileStatus or getLinkFileStatus throw UnresolvedLinkException? it looks to me like this exception is always handled internally. if a filesystem doesn't support symlinks, or the symlink is broken, then other exceptions are thrown. this exception appears to be used purely for internal control purposes and shouldn't be seen by end users so far as i can tell. its visibility should be "limited private", i think. shouldn't FilterFS#getFileLinkStatus() call getFileLinkStatus instead of getFileStatus? in FSLinkResolver, what's the point of the 'catch (IOException e) { throw e; } '?
          Hide
          Eli Collins added a comment -

          Here are patches that merge in the latest changes on trunk and move the symlink implementation off FileSystem and onto AbstractFileSystem, decided not to have a duplicate implementation for FileSystem since clients are moving off it, reasonable?

          Show
          Eli Collins added a comment - Here are patches that merge in the latest changes on trunk and move the symlink implementation off FileSystem and onto AbstractFileSystem, decided not to have a duplicate implementation for FileSystem since clients are moving off it, reasonable?
          Hide
          Eli Collins added a comment -

          Here are patches that combine Dhruba's latest symlink patch with the delta I posted that implements the API changes (modulo incorporating feedback from Dhruba like removing isLink in FileContext since isFile/isDirectory will be deprecated), adds more tests, and resolves merge conflicts. It passes test-core on both repos. I'm adding more tests but thought I'd ping the JIRA in case there's any additional feedback.

          Show
          Eli Collins added a comment - Here are patches that combine Dhruba's latest symlink patch with the delta I posted that implements the API changes (modulo incorporating feedback from Dhruba like removing isLink in FileContext since isFile/isDirectory will be deprecated), adds more tests, and resolves merge conflicts. It passes test-core on both repos. I'm adding more tests but thought I'd ping the JIRA in case there's any additional feedback.
          Hide
          Eli Collins added a comment -

          Here are patches that add more tests and implement the above FileContext API. They are deltas against symlink19 so it's easier to review what's changed. They also contain some resolutions from merging with trunk. I'm writing more tests but wanted to post what I have so far for feedback.

          Note that they change the default behavior to fully resolve symlinks and add an API equivalent to lstat for clients that want to operate on symlinks. The FileContext API could have been implemented without adding a new ClientProtocol function by creating a function resolveFully that calls getLinkTarget on the return value of resolve, but that would require an additional round trip for all FileContext functions that need to full resolve links.

          Show
          Eli Collins added a comment - Here are patches that add more tests and implement the above FileContext API. They are deltas against symlink19 so it's easier to review what's changed. They also contain some resolutions from merging with trunk. I'm writing more tests but wanted to post what I have so far for feedback. Note that they change the default behavior to fully resolve symlinks and add an API equivalent to lstat for clients that want to operate on symlinks. The FileContext API could have been implemented without adding a new ClientProtocol function by creating a function resolveFully that calls getLinkTarget on the return value of resolve , but that would require an additional round trip for all FileContext functions that need to full resolve links.
          Hide
          Eli Collins added a comment -

          While writing tests I noticed the current API doesn't match POSIX semantics that closely, eg if a path refers to a symlink then the symlink is not resolved, eg getFileStatus returns the FileStatus of the link rather than what it points to (ie behaves like lstat rather than stat), ditto for open, setReplication etc. While some APIs should act on the symlink itself (eg rename, delete) others need symlinks fully resolved. The design doc should specify the intended behavior of the FileContext API wrt symlinks. I attached an updated version and pasted the relevant section below. What do people think?

          FileContext APIs

          This section specifies the behavior of the FileContext API when links are present in paths. The intent is to match POSIX semantics. For most functions, if symlinks are supported, all links leading up to the target of a path should automatically be resolved. Some functions will not resolve any links in a given path. Some functions will, if given a path that refers to a symlink, operate on the target of the symlink, while others will operate on the symlink itself. For example, setReplication and getFileBlockLocations act on the symlink target while delete and getFileStatus act on the symlink itself. Behavior is specified both for filesystems that do and do not support symlinks. To support symlink-aware utilities the FileContext API requires some new interfaces (eg equivalent to lstat) to indicate whether a path refers to a symlink.

          • create, mkdir – the path should not refer to a symlink since the path must not currently exist.
          • delete, deleteOnExit – if path refers to a symlink then the symlink is removed (like unlink).
          • open – if the given path refers to a symlink then the path is fully resolved.
          • set|getWorkingDirectory – if the given path refers to a symlink then the symlink is fully resolved when setting the working directory, ie if the working directory is changed to /link1/link2 then subsequent queries of the working directory should return whatever link2 points to.
          • setReplication – if the given path refers to a symlink then the path is fully resolved.
          • setPermission – if the given path refers to a symlink then the path is fully resolved (like chmod). Symlink access is determined by permissions of the target of the symlink.
          • setOwner – if the given path refers to a symlink then the path is fully resolved (like chown). We could add an lchown equivalent in the future.
          • setTimes – if the given path refers to a symlink then the path is fully resolved. ySmlinks do not have access times.
          • get|setFileChecksum – if the given path refers to a symlink then the path is fully resolved, ie there are no checksums associated with symlinks.
          • getFileStatus – if the given path refers to a symlink then the path is fully resolved, ie returns the FileStatus of the file or directory the symlink points to.
          • new getLinkFileStatus – like lstat, if the given path refers to a symlink then the FileStatus of the symlink is returned, otherwise the results as if getFileStatus was called. If symlink support is not enabled or the underlying filesystem does not support symlinks then the results are the same as if getFileStatus was called.
          • isDirectory, isFile – if the given path refers to a symlink then the path is fully resolved, ie if the symlink points to a directory then isDirectory returns true.
          • new isLink – returns true if the given path refers to a symlink. If symlink support is not enabled or the underlying filesystem does not support symlinks then isLink returns false.
          • listStatus – if the given path refers to a symlink then the path is fully resolved, ie the result is equivalent to calling listStatus with the target of the symlink.
          • getFileBlockLocations – if the given path refers to a symlink then the path is fully resolved, ie symlinks are not associated with blocks.
          • getFsStatus – if the given path refers to a symlink then the path is fully resolved, ie the FsStatus of the target of the symlink is returned.
          • getLinkTarget – only the first symlink in the given path is resolved. If symlink support is not enabled or the underlying filesystem does not support symlinks then an IOException is thrown.
          • resolve – all symlinks in the given path are resolved. If symlink support is not enabled or the underlying filesystem does not support symlinks then no symlinks are resolved.
          • createSymlink(oldpath, newpath)
            • newpath should not refer to a symlink since the path must not currently exist.
            • No symlinks are resolved in oldpath. For example, if /link1 points to /dir, and /link1/link2 points to /link1/file, then createSymlink("/link1/file", "/link1/link2") points link2 to /link1/file (not /dir/file). The path /link1/link2 resolves as follows: /dir/link2 -> /link1/file -> /dir/file.
            • If symlink support is not enabled or the underlying filesystem does not support symlinks then an IOException is thrown.
          • rename(oldpath, newpath)
            • if oldpath refers to a symlink, the symlink is renamed (POSIX)
            • if newpath refers to a symlink, the symlink is over-written (POSIX), if the the OVERWRITE option is passed.
          Show
          Eli Collins added a comment - While writing tests I noticed the current API doesn't match POSIX semantics that closely, eg if a path refers to a symlink then the symlink is not resolved, eg getFileStatus returns the FileStatus of the link rather than what it points to (ie behaves like lstat rather than stat ), ditto for open , setReplication etc. While some APIs should act on the symlink itself (eg rename , delete ) others need symlinks fully resolved. The design doc should specify the intended behavior of the FileContext API wrt symlinks. I attached an updated version and pasted the relevant section below. What do people think? FileContext APIs This section specifies the behavior of the FileContext API when links are present in paths. The intent is to match POSIX semantics. For most functions, if symlinks are supported, all links leading up to the target of a path should automatically be resolved. Some functions will not resolve any links in a given path. Some functions will, if given a path that refers to a symlink, operate on the target of the symlink, while others will operate on the symlink itself. For example, setReplication and getFileBlockLocations act on the symlink target while delete and getFileStatus act on the symlink itself. Behavior is specified both for filesystems that do and do not support symlinks. To support symlink-aware utilities the FileContext API requires some new interfaces (eg equivalent to lstat ) to indicate whether a path refers to a symlink. create , mkdir – the path should not refer to a symlink since the path must not currently exist. delete , deleteOnExit – if path refers to a symlink then the symlink is removed (like unlink ). open – if the given path refers to a symlink then the path is fully resolved. set|getWorkingDirectory – if the given path refers to a symlink then the symlink is fully resolved when setting the working directory, ie if the working directory is changed to /link1/link2 then subsequent queries of the working directory should return whatever link2 points to. setReplication – if the given path refers to a symlink then the path is fully resolved. setPermission – if the given path refers to a symlink then the path is fully resolved (like chmod ). Symlink access is determined by permissions of the target of the symlink. setOwner – if the given path refers to a symlink then the path is fully resolved (like chown ). We could add an lchown equivalent in the future. setTimes – if the given path refers to a symlink then the path is fully resolved. ySmlinks do not have access times. get|setFileChecksum – if the given path refers to a symlink then the path is fully resolved, ie there are no checksums associated with symlinks. getFileStatus – if the given path refers to a symlink then the path is fully resolved, ie returns the FileStatus of the file or directory the symlink points to. new getLinkFileStatus – like lstat , if the given path refers to a symlink then the FileStatus of the symlink is returned, otherwise the results as if getFileStatus was called. If symlink support is not enabled or the underlying filesystem does not support symlinks then the results are the same as if getFileStatus was called. isDirectory , isFile – if the given path refers to a symlink then the path is fully resolved, ie if the symlink points to a directory then isDirectory returns true. new isLink – returns true if the given path refers to a symlink. If symlink support is not enabled or the underlying filesystem does not support symlinks then isLink returns false. listStatus – if the given path refers to a symlink then the path is fully resolved, ie the result is equivalent to calling listStatus with the target of the symlink. getFileBlockLocations – if the given path refers to a symlink then the path is fully resolved, ie symlinks are not associated with blocks. getFsStatus – if the given path refers to a symlink then the path is fully resolved, ie the FsStatus of the target of the symlink is returned. getLinkTarget – only the first symlink in the given path is resolved. If symlink support is not enabled or the underlying filesystem does not support symlinks then an IOException is thrown. resolve – all symlinks in the given path are resolved. If symlink support is not enabled or the underlying filesystem does not support symlinks then no symlinks are resolved. createSymlink(oldpath, newpath) newpath should not refer to a symlink since the path must not currently exist. No symlinks are resolved in oldpath . For example, if /link1 points to /dir , and /link1/link2 points to /link1/file , then createSymlink("/link1/file", "/link1/link2") points link2 to /link1/file (not /dir/file ). The path /link1/link2 resolves as follows: /dir/link2 -> /link1/file -> /dir/file . If symlink support is not enabled or the underlying filesystem does not support symlinks then an IOException is thrown. rename(oldpath, newpath) – if oldpath refers to a symlink, the symlink is renamed (POSIX) if newpath refers to a symlink, the symlink is over-written (POSIX), if the the OVERWRITE option is passed.
          Hide
          dhruba borthakur added a comment -

          Hi Eli, the latest patch should address your concerns about getLinkTarget(). The unit test TestLink runs successfully now.

          Show
          dhruba borthakur added a comment - Hi Eli, the latest patch should address your concerns about getLinkTarget(). The unit test TestLink runs successfully now.
          Hide
          dhruba borthakur added a comment -

          Unit test TestLink runs successfully.

          Show
          dhruba borthakur added a comment - Unit test TestLink runs successfully.
          Hide
          dhruba borthakur added a comment -

          I am comfortable making the changes resolve links in FileContext instead of FileSystem. I also removed the optimization to pass in the exception object to getLinkTarget(). I also fixed the default implementations of createSymLink and getLinkTarget().

          Show
          dhruba borthakur added a comment - I am comfortable making the changes resolve links in FileContext instead of FileSystem. I also removed the optimization to pass in the exception object to getLinkTarget(). I also fixed the default implementations of createSymLink and getLinkTarget().
          Hide
          Sanjay Radia added a comment -

          >Unlike other filesystem objects, symbolic links do not have an owner, group, access mode, times, etc. Instead, these attributes are taken from the directory that contains the link.

          hmmm ... since the owner and group are checked for removal/renaming of symlinks I thought the owner and group of the symlink itself can be different than that of the parent directory.

          Show
          Sanjay Radia added a comment - >Unlike other filesystem objects, symbolic links do not have an owner, group, access mode, times, etc. Instead, these attributes are taken from the directory that contains the link. hmmm ... since the owner and group are checked for removal/renaming of symlinks I thought the owner and group of the symlink itself can be different than that of the parent directory.
          Hide
          Doug Cutting added a comment -

          Dhruba, overall I like this patch. Are you able to live with accessing this through FileContext instead of FileSystem?

          > Please note that the signature of FileContext.getLinkTarget(Path p, UnresolvedLinkException e)
          > is such that the DFSClient can use the contents of the exception to resolve the link entirely in the
          > client-side without making another RPC to the namenode.
          > The tradeoff is that the API does not look elegant. Thoughts?

          I'd prefer not to use exceptions to pass data like this. If we need to implement this optimization then I'd prefer we make it explicit. Perhaps we can revisit this in the AbstractFileSystem patch, and commit this issue without passing that exception as a parameter?

          A few nits:

          • the default definition of getLinkTarget() should probably throw an exception rather than return null, no?
          • the default definition of createSymLink() should also probably throw an exception.
          Show
          Doug Cutting added a comment - Dhruba, overall I like this patch. Are you able to live with accessing this through FileContext instead of FileSystem? > Please note that the signature of FileContext.getLinkTarget(Path p, UnresolvedLinkException e) > is such that the DFSClient can use the contents of the exception to resolve the link entirely in the > client-side without making another RPC to the namenode. > The tradeoff is that the API does not look elegant. Thoughts? I'd prefer not to use exceptions to pass data like this. If we need to implement this optimization then I'd prefer we make it explicit. Perhaps we can revisit this in the AbstractFileSystem patch, and commit this issue without passing that exception as a parameter? A few nits: the default definition of getLinkTarget() should probably throw an exception rather than return null, no? the default definition of createSymLink() should also probably throw an exception.
          Hide
          Eli Collins added a comment -

          Will check out the diffs. Wasn't the idea behind getLinkTarget (and using an additional RPC) in proposal IV the middle ground between the two solutions that don't require an extra RPC (putting it in an exception or the normal return values) but there was no agreement on? ie does Konstantin's comment still apply: http://issues.apache.org/jira/browse/HDFS-245?focusedCommentId=12638152&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12638152

          Show
          Eli Collins added a comment - Will check out the diffs. Wasn't the idea behind getLinkTarget (and using an additional RPC) in proposal IV the middle ground between the two solutions that don't require an extra RPC (putting it in an exception or the normal return values) but there was no agreement on? ie does Konstantin's comment still apply: http://issues.apache.org/jira/browse/HDFS-245?focusedCommentId=12638152&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12638152
          Hide
          dhruba borthakur added a comment -

          Moved the path resolving code to FileContext. Also renamed MountPointException to UnresolvedLinkException. Please note that the signature of FileContext.getLinkTarget(Path p, UnresolvedLinkException e) is such that the DFSClient can use the contents of the exception to resolve the link entirely in the client-side without making another RPC to the namenode. The tradeoff is that the API does not look elegant. Thoughts?

          Show
          dhruba borthakur added a comment - Moved the path resolving code to FileContext. Also renamed MountPointException to UnresolvedLinkException. Please note that the signature of FileContext.getLinkTarget(Path p, UnresolvedLinkException e) is such that the DFSClient can use the contents of the exception to resolve the link entirely in the client-side without making another RPC to the namenode. The tradeoff is that the API does not look elegant. Thoughts?
          Hide
          Eli Collins added a comment -

          Hey Dhruba,

          Patch looks good, some comments:

          MountPointException why extend IOException to not modify function prototypes to declare it? Don't we want the interfaces to explicility delcare this, eg as is done with UnresolvedPathException?

          Why do we need both MountPointException (now UnresolvedLinkException) and UnresolvedPathException?

          Why does getLinkTargetImpl need to take an exception, shouldn't it call into the NN to resolve the given path?

          How can createSymlink throw DSQuotaExceededException? Seems like createSymlinkInternal won't (since symlinks have no contents).

          Agree with making maxPathLinks in range 8-32

          The NN throws UnresolvedPathException even when it can resolve the link internally, perhaps add the latter as a comment that this could be a future optimization.

          Nit, constructPath should use StringBuilder

          Show
          Eli Collins added a comment - Hey Dhruba, Patch looks good, some comments: MountPointException why extend IOException to not modify function prototypes to declare it? Don't we want the interfaces to explicility delcare this, eg as is done with UnresolvedPathException? Why do we need both MountPointException (now UnresolvedLinkException) and UnresolvedPathException? Why does getLinkTargetImpl need to take an exception, shouldn't it call into the NN to resolve the given path? How can createSymlink throw DSQuotaExceededException? Seems like createSymlinkInternal won't (since symlinks have no contents). Agree with making maxPathLinks in range 8-32 The NN throws UnresolvedPathException even when it can resolve the link internally, perhaps add the latter as a comment that this could be a future optimization. Nit, constructPath should use StringBuilder
          Hide
          Allen Wittenauer added a comment -

          The reason for chown -h and friends is simple: what happens when a link needs to move or be controlled by a different entity than the same one as the directory? Think access control! If I want to force a user to use a different server for data (esp RO), having the link owned by them defeats the purpose. I don't have enough fingers or toes to count the number of times I've had to use this under UNIX. (or, at least, UNIXes that conformed with IEEE Std 1003.1... )

          Show
          Allen Wittenauer added a comment - The reason for chown -h and friends is simple: what happens when a link needs to move or be controlled by a different entity than the same one as the directory? Think access control! If I want to force a user to use a different server for data (esp RO), having the link owned by them defeats the purpose. I don't have enough fingers or toes to count the number of times I've had to use this under UNIX. (or, at least, UNIXes that conformed with IEEE Std 1003.1... )
          Hide
          Doug Cutting added a comment -

          > The FileContext API is not yet clearly baked as far as I can tell.

          It's committed. It needs to get used. Adding the symlink feature there will get both used and tested without altering existing codepaths.

          > I'm fine with only doing it in file context... if we wait to checkin symlinks until FileSystem is fully deprecated.

          I don't follow your logic here, Eric. Doesn't it make sense to first add a new API in a trial manner? Pioneers can work out issues with symbolic links before others are forced to move to the new API.

          Symbolic links can be added to FileContext such a way that the only changes to FileSystem would be to add a method, getLinkTarget(), that no existing code calls. So the changes to Common are very low risk this way.

          Separately we can then have a patch that adds symbolic link support to HDFS. The primary change to existing codepaths here would be changing places where FileNotFoundException would be thrown to instead throw UnresolvedLinkException when the path to the file contains a symbolic link. The larger risk/complication in HDFS is that links would be added to FileStatus, and hence the namenode log format would need to be extended and we'd need a downgrade story. So this patch would have more risk. But it can be considered separately.

          Additionally we can have patches that add support for local FS, S3, etc.

          Show
          Doug Cutting added a comment - > The FileContext API is not yet clearly baked as far as I can tell. It's committed. It needs to get used. Adding the symlink feature there will get both used and tested without altering existing codepaths. > I'm fine with only doing it in file context... if we wait to checkin symlinks until FileSystem is fully deprecated. I don't follow your logic here, Eric. Doesn't it make sense to first add a new API in a trial manner? Pioneers can work out issues with symbolic links before others are forced to move to the new API. Symbolic links can be added to FileContext such a way that the only changes to FileSystem would be to add a method, getLinkTarget(), that no existing code calls. So the changes to Common are very low risk this way. Separately we can then have a patch that adds symbolic link support to HDFS. The primary change to existing codepaths here would be changing places where FileNotFoundException would be thrown to instead throw UnresolvedLinkException when the path to the file contains a symbolic link. The larger risk/complication in HDFS is that links would be added to FileStatus, and hence the namenode log format would need to be extended and we'd need a downgrade story. So this patch would have more risk. But it can be considered separately. Additionally we can have patches that add support for local FS, S3, etc.
          Hide
          dhruba borthakur added a comment -

          > I disagree. We need chown -h (POSIX) and chown -H, chown -L, and chown -P (SVID?) for the same reasons....

          @Allen, can you pl elaborate on the reasons? From the lstat(2) man page: "Unlike other filesystem objects, symbolic links do not have an owner, group, access mode, times, etc. Instead, these attributes are taken from the directory that contains the link. The only attributes returned from an lstat() that refer to the symbolic link itself are the file type (S_IFLNK), size, blocks, and link count (always 1)."

          Show
          dhruba borthakur added a comment - > I disagree. We need chown -h (POSIX) and chown -H, chown -L, and chown -P (SVID?) for the same reasons.... @Allen, can you pl elaborate on the reasons? From the lstat(2) man page: "Unlike other filesystem objects, symbolic links do not have an owner, group, access mode, times, etc. Instead, these attributes are taken from the directory that contains the link. The only attributes returned from an lstat() that refer to the symbolic link itself are the file type (S_IFLNK), size, blocks, and link count (always 1)."
          Hide
          dhruba borthakur added a comment -

          Completely agree that we should make the APIs as close to POSIX as possible. And I do not see this patch as part of 0.21 at all. Maybe part of trunk at some distant time

          Show
          dhruba borthakur added a comment - Completely agree that we should make the APIs as close to POSIX as possible. And I do not see this patch as part of 0.21 at all. Maybe part of trunk at some distant time
          Hide
          eric baldeschwieler added a comment -

          So who is going to takeout the trash if we half implement this now? Half implemented features have a way of really turning around and biting us. I'd hate to see us move the system to a place where the API was not well defined or was inconsistent about the handling of something this basic.

          There are a lot of spec questions still open! I'd suggest this feature still has a lot of work to go and should either go into a branch or live as a expanding patch until folks feel it is complete. It is certainly not ready to be back ported to 21.

          I'm fine with only doing it in file context... if we wait to checkin symlinks until FileSystem is fully deprecated.

          I'd suggest that we should follow the posix conventions here as closely as possible. We should understand why we are diverging anywhere we do diverge. This discussion has convinced me we need a pretty complete spec & a lot of testing before we can add this. it will be easier to build a spec if we start with posix's and then look for problems...

          Show
          eric baldeschwieler added a comment - So who is going to takeout the trash if we half implement this now? Half implemented features have a way of really turning around and biting us. I'd hate to see us move the system to a place where the API was not well defined or was inconsistent about the handling of something this basic. There are a lot of spec questions still open! I'd suggest this feature still has a lot of work to go and should either go into a branch or live as a expanding patch until folks feel it is complete. It is certainly not ready to be back ported to 21. I'm fine with only doing it in file context... if we wait to checkin symlinks until FileSystem is fully deprecated. I'd suggest that we should follow the posix conventions here as closely as possible. We should understand why we are diverging anywhere we do diverge. This discussion has convinced me we need a pretty complete spec & a lot of testing before we can add this. it will be easier to build a spec if we start with posix's and then look for problems...
          Hide
          dhruba borthakur added a comment -

          @Robert: The permission bits on a symbolic link are redundant. They are never checked. The permission bits on the target of the symbolic link is what matters.

          Show
          dhruba borthakur added a comment - @Robert: The permission bits on a symbolic link are redundant. They are never checked. The permission bits on the target of the symbolic link is what matters.
          Hide
          Robert Chansler added a comment -

          I don't much care what happens on Day Zero, only that it is specified in advance so that an informed decision can be made about whether to use symlinks. Similarly, in reply to Allen, it's not so much what lstat returns, only that it is specified. (We do need to be very clear on permissions for links.)

          Show
          Robert Chansler added a comment - I don't much care what happens on Day Zero, only that it is specified in advance so that an informed decision can be made about whether to use symlinks. Similarly, in reply to Allen, it's not so much what lstat returns, only that it is specified. (We do need to be very clear on permissions for links.)
          Hide
          dhruba borthakur added a comment -

          The FileContext API is not yet clearly baked as far as I can tell. I would like the FileSystem API to do the right thing about symlinks untill the time we deprecate it and make aps avoid FileSystem completely. I think we should avoid making apps use FileContext if they want to use symlinks. If one is worried about how existing apps behave in the presence of symlinks, then that adminstrator can set dfs.support.symlinks to false. This is a server side config that prevents the creation of symlinks.

          distcp and har should avoid following symlinks by default. I will create a JIRA that will make these utilities support an option to allow following symlinks.

          @Robert: I udnerstand your concern about day-zero behaviour. will it satisfy you if dfs.support.symlinks is set to false by default?

          @Allen: fsck just reports the number of symlinks it found. It does not follow them. Also, this patch does not implement "chown -h (POSIX) and chown -H, chown -L, and chown -P (SVID?)". Can I implement them in a subsequent JIRA?

          Show
          dhruba borthakur added a comment - The FileContext API is not yet clearly baked as far as I can tell. I would like the FileSystem API to do the right thing about symlinks untill the time we deprecate it and make aps avoid FileSystem completely. I think we should avoid making apps use FileContext if they want to use symlinks. If one is worried about how existing apps behave in the presence of symlinks, then that adminstrator can set dfs.support.symlinks to false. This is a server side config that prevents the creation of symlinks. distcp and har should avoid following symlinks by default. I will create a JIRA that will make these utilities support an option to allow following symlinks. @Robert: I udnerstand your concern about day-zero behaviour. will it satisfy you if dfs.support.symlinks is set to false by default? @Allen: fsck just reports the number of symlinks it found. It does not follow them. Also, this patch does not implement "chown -h (POSIX) and chown -H, chown -L, and chown -P (SVID?)". Can I implement them in a subsequent JIRA?
          Hide
          Doug Cutting added a comment -

          Perhaps FileSystem#getFileStatus() of a symbolic link should not attempt resolution, but instead return information directly about the link. (Similarly for AbstractFileSystem, once that's added.) But FileContext#getFileStatus() should resolve the link, returning information about the target.

          Show
          Doug Cutting added a comment - Perhaps FileSystem#getFileStatus() of a symbolic link should not attempt resolution, but instead return information directly about the link. (Similarly for AbstractFileSystem, once that's added.) But FileContext#getFileStatus() should resolve the link, returning information about the target.
          Hide
          Allen Wittenauer added a comment -

          (From the lstat(2) man page: "Unlike other filesystem objects, symbolic links do not have an owner, group, access mode, times, etc. Instead, these attributes are taken from the directory that contains the link. The only attributes returned from an lstat() that refer to the symbolic link itself are the file type (S_IFLNK), size, blocks, and link count (always 1)."

          Danger! lstat() implementations differ between the various flavors! IEEE Std 1003.1, 2004 http://www.opengroup.org/onlinepubs/009695399/functions/lstat.html says:

          The lstat() function shall be equivalent to stat(), except when path refers to a symbolic link. In that case lstat() shall return information about the link, while stat() shall return information about the file the link references.

          Show
          Allen Wittenauer added a comment - (From the lstat(2) man page: "Unlike other filesystem objects, symbolic links do not have an owner, group, access mode, times, etc. Instead, these attributes are taken from the directory that contains the link. The only attributes returned from an lstat() that refer to the symbolic link itself are the file type (S_IFLNK), size, blocks, and link count (always 1)." Danger! lstat() implementations differ between the various flavors! IEEE Std 1003.1, 2004 http://www.opengroup.org/onlinepubs/009695399/functions/lstat.html says: The lstat() function shall be equivalent to stat(), except when path refers to a symbolic link. In that case lstat() shall return information about the link, while stat() shall return information about the file the link references.
          Hide
          Sanjay Radia added a comment -

          @doug
          >If we're really on the verge of deprecating FileSystem (HADOOP-6223) then perhaps we should not invest effort adding new features there. ...

          I think Facebook was hoping to use this to replace their internal patch which uses file system. Your thoughts Dhruba?

          Where should the patch go in the new stack (FileContext, AbstractFileSystem)?
          I believe it belongs inside the FileContext layer (or at least above the AbstractFileSystem layer) since it is FileContext layer that understand the global URI name space. Each AbstractFileSystem deals with only
          its local namespace within the filesystem and hence it is "wrong" for the AbstractFileSystem to follow symlinks to a foreign location.

          > MountPointException would be better named something like UnresolvedLinkException. ...
          Agreed.

          Show
          Sanjay Radia added a comment - @doug >If we're really on the verge of deprecating FileSystem ( HADOOP-6223 ) then perhaps we should not invest effort adding new features there. ... I think Facebook was hoping to use this to replace their internal patch which uses file system. Your thoughts Dhruba? Where should the patch go in the new stack (FileContext, AbstractFileSystem)? I believe it belongs inside the FileContext layer (or at least above the AbstractFileSystem layer) since it is FileContext layer that understand the global URI name space. Each AbstractFileSystem deals with only its local namespace within the filesystem and hence it is "wrong" for the AbstractFileSystem to follow symlinks to a foreign location. > MountPointException would be better named something like UnresolvedLinkException. ... Agreed.
          Hide
          Eli Collins added a comment -

          Apologies for the dupe, looks like jira stripped out the quoted text in my email above.

          Thanks Nigel, Allen, Robert for taking a look. Will update the doc with feedback.

          Loops should be avoided by having the client limit the number of links it will traverse What about loops within a filesystem? Does the NN also limit the number of links it will traverse?

          Good point, the NN should do the same, use the same limit for consistency.

          You give examples of commands the operate on the link and examples that operate on the link target and examples of those that depend on a trailing slash. Given this is a design, can you be explicitly and enumerate the commands for each of there? For instance, setReplication, setTimes, du, etc.

          Will do, how about as part of the user guide?

          What is the new options for fsck to report dangling links? What does the output look like?

          Agree with Allen. Updated to read "fsck should not follow or report dangling links"

          What is the new option for distcp to follow symlinks?

          Probably don't need one, seems like distcp should following links and...

          If distcp doesn't follow symlinks, I assume it just copies the symlink. In this case, is the symlink adjusted to point to the source location on the source FS?

          would not adjust links when it's copying them.

          What does the ls output look like for a symlink?

          Not sure the design doc needs to specify but how about the usual fileName -> path?

          Do symlinks contribute bytes toward a quota?

          Agree w/ Rob, links should not count towards quota (they have no storage after all).

          access control properties of links are ignored, checks are always performed against the link target (if it resides in an HDFS file system). The ch* operations should operate directly on the target.

          I disagree. We need chown -h (POSIX) and chown -H, chown -L, and chown -P (SVID?) for the same reasons that those OSes support them.

          You're right, the above should be chmod (not ch*). And checks against the link target should be performed regardless of what file system it resides on. And yes ch* utilities need to be link-aware.

          We also need test -h and test -L supported.

          Agreed. Seems like these should all go in the user guide, unless people want them spelled out in the design doc.

          Wrt lsr and rmr, I'd make them not follow links by default and add options that do (can detect loops by remembering the paths they've visited).

          +1 to Doug's idea of exposing links solely via FileContext, so that day zero behavior ignores links and clients get updated to be link aware as necessary. Are people OK updating them as they move to FileContext?

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Apologies for the dupe, looks like jira stripped out the quoted text in my email above. Thanks Nigel, Allen, Robert for taking a look. Will update the doc with feedback. Loops should be avoided by having the client limit the number of links it will traverse What about loops within a filesystem? Does the NN also limit the number of links it will traverse? Good point, the NN should do the same, use the same limit for consistency. You give examples of commands the operate on the link and examples that operate on the link target and examples of those that depend on a trailing slash. Given this is a design, can you be explicitly and enumerate the commands for each of there? For instance, setReplication, setTimes, du, etc. Will do, how about as part of the user guide? What is the new options for fsck to report dangling links? What does the output look like? Agree with Allen. Updated to read "fsck should not follow or report dangling links" What is the new option for distcp to follow symlinks? Probably don't need one, seems like distcp should following links and... If distcp doesn't follow symlinks, I assume it just copies the symlink. In this case, is the symlink adjusted to point to the source location on the source FS? would not adjust links when it's copying them. What does the ls output look like for a symlink? Not sure the design doc needs to specify but how about the usual fileName -> path? Do symlinks contribute bytes toward a quota? Agree w/ Rob, links should not count towards quota (they have no storage after all). access control properties of links are ignored, checks are always performed against the link target (if it resides in an HDFS file system). The ch* operations should operate directly on the target. I disagree. We need chown -h (POSIX) and chown -H, chown -L, and chown -P (SVID?) for the same reasons that those OSes support them. You're right, the above should be chmod (not ch*). And checks against the link target should be performed regardless of what file system it resides on. And yes ch* utilities need to be link-aware. We also need test -h and test -L supported. Agreed. Seems like these should all go in the user guide, unless people want them spelled out in the design doc. Wrt lsr and rmr, I'd make them not follow links by default and add options that do (can detect loops by remembering the paths they've visited). +1 to Doug's idea of exposing links solely via FileContext, so that day zero behavior ignores links and clients get updated to be link aware as necessary. Are people OK updating them as they move to FileContext? Thanks, Eli
          Hide
          Eli Collins added a comment -

          Thanks Nigel, Allen, Robert for taking a look. Will update the doc
          with feedback.

          Good point, the NN should do the same, use the same limit for consistency.

          Agree with Allen. Updated to read "fsck should not follow or report
          dangling links"

          Probably don't need one, seems like distcp should following links and...

          would not adjust links when it's copying them.

          Not sure the design doc needs to specify but how about the usual
          fileName -> path?

          Agree w/ Rob, links should not count towards quota (they have no
          storage after all).

          You're right, the above should be chmod (not ch*). And checks against
          the link target should be performed regardless of what file system it
          resides on. And yes ch* utilities need to be link-aware.

          Agreed. Seems like these should all go in the user guide, unless
          people want them spelled out in the design doc.

          Wrt lsr and rmr, I'd make them not follow links by default and add
          options that do (can detect loops by remembering the paths they've
          visited).

          +1 to Doug's idea of exposing links solely via FileContext, so that
          day zero behavior ignores links and clients get updated to be link
          aware as necessary. Are people OK updating them as they move to
          FileContext?

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Thanks Nigel, Allen, Robert for taking a look. Will update the doc with feedback. Good point, the NN should do the same, use the same limit for consistency. Agree with Allen. Updated to read "fsck should not follow or report dangling links" Probably don't need one, seems like distcp should following links and... would not adjust links when it's copying them. Not sure the design doc needs to specify but how about the usual fileName -> path? Agree w/ Rob, links should not count towards quota (they have no storage after all). You're right, the above should be chmod (not ch*). And checks against the link target should be performed regardless of what file system it resides on. And yes ch* utilities need to be link-aware. Agreed. Seems like these should all go in the user guide, unless people want them spelled out in the design doc. Wrt lsr and rmr, I'd make them not follow links by default and add options that do (can detect loops by remembering the paths they've visited). +1 to Doug's idea of exposing links solely via FileContext, so that day zero behavior ignores links and clients get updated to be link aware as necessary. Are people OK updating them as they move to FileContext? Thanks, Eli
          Hide
          Doug Cutting added a comment -

          > But on Day Zero, are links in command arguments always followed?

          If we follow my suggestion of not implementing links except through the FileContext API, then, on Day Zero, existing code, none of which yet uses FileContext, would not follow links. Symbolic links accessed via the old API could simply appear as empty files: they'd have zero length and an empty blocklist. Then, as we update tools to use the FileContext API, we can decide how each should process symbolic links. Would that help?

          Show
          Doug Cutting added a comment - > But on Day Zero, are links in command arguments always followed? If we follow my suggestion of not implementing links except through the FileContext API, then, on Day Zero, existing code, none of which yet uses FileContext, would not follow links. Symbolic links accessed via the old API could simply appear as empty files: they'd have zero length and an empty blocklist. Then, as we update tools to use the FileContext API, we can decide how each should process symbolic links. Would that help?
          Hide
          Robert Chansler added a comment -

          A further thought (perhaps implied by Allen's remarks) is that there needs to be a specification of Day Zero behavior for shell commands. I'm not opposed to adding options separately. But on Day Zero, are links in command arguments always followed? Does lsr follow links? If so, how does lsr avoid loops? rmr? And what about permissions for links? What is the moral equivalent of lstat(2)? (From the lstat(2) man page: "Unlike other filesystem objects, symbolic links do not have an owner, group, access mode, times, etc. Instead, these attributes are taken from the directory that contains the link. The only attributes returned from an lstat() that refer to the symbolic link itself are the file type (S_IFLNK), size, blocks, and link count (always 1)."

          A link is a name, and so should count against the name quota, but not against the space quota.

          fsck should never follow links. Nor is there any meaningful sense to whether a link is "broken." When resolution is attempted, a link either resolves or it doesn't.

          Show
          Robert Chansler added a comment - A further thought (perhaps implied by Allen's remarks) is that there needs to be a specification of Day Zero behavior for shell commands. I'm not opposed to adding options separately. But on Day Zero, are links in command arguments always followed? Does lsr follow links? If so, how does lsr avoid loops? rmr ? And what about permissions for links? What is the moral equivalent of lstat(2)? (From the lstat(2) man page: "Unlike other filesystem objects, symbolic links do not have an owner, group, access mode, times, etc. Instead, these attributes are taken from the directory that contains the link. The only attributes returned from an lstat() that refer to the symbolic link itself are the file type (S_IFLNK), size, blocks, and link count (always 1)." A link is a name, and so should count against the name quota, but not against the space quota. fsck should never follow links. Nor is there any meaningful sense to whether a link is "broken." When resolution is attempted, a link either resolves or it doesn't.
          Hide
          Allen Wittenauer added a comment -

          What is the new options for fsck to report dangling links? What does the output look like?

          Why should fsck report this at all? Since symlinks take the form of a URL, fsck would need to make call outs to a potentially remote system to verify. Over a large file system with many symlinks, this could be a tremendous amount of network bandwidth. IIRC, no other OS has a "report all broken symlinks" functionality. The reality is, is that very few symlinks are "important" enough to have that much scrutiny. Instead, there is a reliance upon savvy admins to write scripts using find to locate symlinks and verify. [But since we have no find ... ]

          access control properties of links are ignored, checks are always performed against the link target (if it resides in an HDFS file system). The ch* operations should operate directly on the target.

          I disagree. We need chown -h (POSIX) and chown -H, chown -L, and chown -P (SVID?) for the same reasons that those OSes support them.

          We also need test -h and test -L supported.

          Show
          Allen Wittenauer added a comment - What is the new options for fsck to report dangling links? What does the output look like? Why should fsck report this at all? Since symlinks take the form of a URL, fsck would need to make call outs to a potentially remote system to verify. Over a large file system with many symlinks, this could be a tremendous amount of network bandwidth. IIRC, no other OS has a "report all broken symlinks" functionality. The reality is, is that very few symlinks are "important" enough to have that much scrutiny. Instead, there is a reliance upon savvy admins to write scripts using find to locate symlinks and verify. [But since we have no find ... ] access control properties of links are ignored, checks are always performed against the link target (if it resides in an HDFS file system). The ch* operations should operate directly on the target. I disagree. We need chown -h (POSIX) and chown -H, chown -L, and chown -P (SVID?) for the same reasons that those OSes support them. We also need test -h and test -L supported.
          Hide
          Doug Cutting added a comment -

          A few comments on the common patch:

          • If we're really on the verge of deprecating FileSystem (HADOOP-6223) then perhaps we should not invest effort adding new features there. We could instead mostly leave FileSystem alone. Symlinks would only be implemented for applications that use the FileContext API (which might help encourage folks to move to that API). Until HADOOP-6223 is committed, existing FileSystem implementations can be modified to throw MountPointException, and we'd still need to add the getLinkTarget() method, but I don't think we'd need to make many other changes to FileSystem. This also reduces risk, since existing application code paths are altered minimally. What do folks think?
          • maxPathLinks should probably be less than 1000. Earlier I suggested this should be somewhere between 8 (POSIX minimum) and 32 and perhaps configurable.
          • MountPointException would be better named something like UnresolvedLinkException. We don't elsewhere use the term 'mount point" and there's no need to introduce it here.
          Show
          Doug Cutting added a comment - A few comments on the common patch: If we're really on the verge of deprecating FileSystem ( HADOOP-6223 ) then perhaps we should not invest effort adding new features there. We could instead mostly leave FileSystem alone. Symlinks would only be implemented for applications that use the FileContext API (which might help encourage folks to move to that API). Until HADOOP-6223 is committed, existing FileSystem implementations can be modified to throw MountPointException, and we'd still need to add the getLinkTarget() method, but I don't think we'd need to make many other changes to FileSystem. This also reduces risk, since existing application code paths are altered minimally. What do folks think? maxPathLinks should probably be less than 1000. Earlier I suggested this should be somewhere between 8 (POSIX minimum) and 32 and perhaps configurable. MountPointException would be better named something like UnresolvedLinkException. We don't elsewhere use the term 'mount point" and there's no need to introduce it here.
          Hide
          dhruba borthakur added a comment -

          Here is a patch that conforms to the modified consensus arrived earlier in the week. I would like everybody to review the symlink17-common.txt patch that has changes to the FileSystem API.

          This code is not yet tested in great detail and is uploaded only for review purposes. (Also, this patch does not yet modify the FileContext API.)

          Show
          dhruba borthakur added a comment - Here is a patch that conforms to the modified consensus arrived earlier in the week. I would like everybody to review the symlink17-common.txt patch that has changes to the FileSystem API. This code is not yet tested in great detail and is uploaded only for review purposes. (Also, this patch does not yet modify the FileContext API.)
          Hide
          Nigel Daley added a comment -

          Thanks Eli. The design states:

          Loops should be avoided by having the client limit the number of links it will traverse

          What about loops within a filesystem? Does the NN also limit the number of links it will traverse?

          You give examples of commands the operate on the link and examples that operate on the link target and examples of those that depend on a trailing slash. Given this is a design, can you be explicitly and enumerate the commands for each of there? For instance, setReplication, setTimes, du, etc.

          What is the new options for fsck to report dangling links? What does the output look like?

          What is the new option for distcp to follow symlinks?

          If distcp doesn't follow symlinks, I assume it just copies the symlink. In this case, is the symlink adjusted to point to the source location on the source FS?

          What does the ls output look like for a symlink?

          Do symlinks contribute bytes toward a quota?

          Show
          Nigel Daley added a comment - Thanks Eli. The design states: Loops should be avoided by having the client limit the number of links it will traverse What about loops within a filesystem? Does the NN also limit the number of links it will traverse? You give examples of commands the operate on the link and examples that operate on the link target and examples of those that depend on a trailing slash. Given this is a design, can you be explicitly and enumerate the commands for each of there? For instance, setReplication, setTimes, du, etc. What is the new options for fsck to report dangling links? What does the output look like? What is the new option for distcp to follow symlinks? If distcp doesn't follow symlinks, I assume it just copies the symlink. In this case, is the symlink adjusted to point to the source location on the source FS? What does the ls output look like for a symlink? Do symlinks contribute bytes toward a quota?
          Hide
          Eli Collins added a comment -

          Uploaded a first draft of a design doc (followed the template in HADOOP-5587). Content follows. Lemme know where I missed the boat. Thanks, Eli

          Symlinks Design Doc

          Problem definition

          HDFS path resolution has the following limitations:

          • Files and directories are only accessible via a single path.
          • An HDFS namespace may not span multiple file systems.

          Symbolic links address these limitations by providing an additional level of indirection when resolving paths in an HDFS file system. A symbolic link is a special type of file that contain a path to another file or directory. Paths may be relative or absolute. Relative paths (eg ../user) provide an alternate path to a single file or directory in the file system. An absolute path may be relative to the current file system (eg /user) or specify a URL (eg hdfs://localhost:8020/foo) which allows the link to point to any file or directory irrespective of the source and destination file systems.

          Allowing multiple paths to resolve to the same file or directory, and HDFS namespaces to span multiple file systems makes it easier to access files and manage their underlying stoarge.

          Use Cases

          If an application requires data be available by a particular path a symlink may be used in lieu of copying the data from its current location. For example, a user may want to create a symlink /data/latest that points to an existing directory so that the latest data is accessible via it's current name and an alias, eg:

          $ hadoop fs -ln /data/20090922 /data/latest

          The user may eventually want to archive this data so that it's accessible but stored more efficiently. They could create an archive of the files in the 20090922 directory and make the original path a symlink to the HAR, eg:

          $ hadoop fs -ln har:///data/20090922.har /data/20090922

          They could also move the directory to another file system that is perhaps lightly loaded or less expensive and make the existing directory a symlink, eg:

          $ hadoop fs -ln hdfs://archive-host/data/20090922 /data/20090922

          The archival file system could also be accessible via an alternative protocol (eg FTP). In both cases the original data has moved but remains accessible by its original path.

          This technique can be used generally to balance storage by transparently making a namespace span multiple file systems. For example, if a particular subtree of a namespace outgrows the capabilities of the file system it resides on (eg Namenode performance, number of files, etc) it it can be moved to a new file system and linked into its current path.

          A symbolic link is also a useful primitive that could be used to implement atomic rename within a file system by atomically rewriting a symbolic link, or to rename a file across partitions (if in the future a file system metadata is partitioned across multiple hosts). See HADOOP-6240 for more info.

          Interaction with the Current System

          The user may interact with symbolic links via the shell or indirectly via applications (eg libhdfs clients like fuse mounts).

          Symbolic links are transparent to most operations though some may want to handle links specially. In general, the behavior should match POSIX where appropriate. Note that linking across file systems is somehwat equivalent to creating a symlink across mount points in POSIX.

          Path resolution:

          • Some commands operate on the link directly (eg stat, rm) if the link is the target.
          • Some commands (eg mv) should operate on the link target if a trailing slash is used (eg if /bar is a link that points to a directory, mv /foo /bar renames bar to foo while mv /foo /bar moves /foo into the directory pointed to by bar).
          • Symbolic links in archive URIs should fully resolve.
          • Some APIs should operate on the link target (eg setting access and modification times).

          Permissions: access control properties of links are ignored, checks are always performed against the link target (if it resides in an HDFS file system). The ch* operations should operate directly on the target.

          Some utilities need to be link-aware:

          • distcp should not follow links by default.
          • fsck should only look at each file once, could optionally report dangling links.
          • Symbolic links in HARs should be followed (so that a symlink to a HAR preserves the original path resolution behavior).

          Symbolic links exist independently of their targets, and may point to non-existing files or directories.

          Requirements

          Clients send the entire path to the Namenode for resolution. The path may contain multiple links:

          • If all links in the path are relative to the current file system then the Namenode transparently (to the client) resolves the path.
          • If the Namenode finds a link in the path that points outside the file system it must provide API(s) to report to the client that (a) it can not resolve a path (due to a link that points outside the file system) and (b) return the target of the link and the remainder of the path that still needs to be resolved.

          Symbolic links should be largely invisible to users of the client. However, symbolic links may introduce cycles into path resolution. For example, a link may point to another URI (on another file system) which points back to the link. Loops should be avoided by having the client limit the number of links it will traverse, and report to its user that the operation was not successful.

          Symbolic links should not introduce significant overhead in the common case, resolving paths without links. Resolving symbolic links may be a frequent operation. If links are being used to transparently span a namespace across multiple file systems then the "root" Namenode may do little work aside from resolving link paths. Therefore, link resolution should have reasonable performance overhead and limited side-effects on both the client and Namenode.

          Design

          One approach is to have the client first stat the file to see if it is a symbolic link and resolve the path. Once the path is fully resolved (this may require contacting additional file systems) the client performs the desired operation. This introduces an additional RPC in the common case (when links are not present) so an optimization is to optimistically perform the operation first. If it was unsucessful due to the presence of an external link the Namenode notifies the client, using an exception, which is caught at the FileSystem level (since the link refers to another file system). The client then makes an additional call to the Namenode to resolve the link.* Once the path is fully resolved the operation can be re-tried. Note that if the operation contained multiple paths each may contain links which must be fully resolved before the operation can complete. This may require additional round trips. The call to resolve a link may fail if the link has been deleted, or is no longer a link, etc. In this case the resulting exception is passed upwards as if the original operation failed. As stated above, link resolution at the FileSystem level will perform a limited number of link resolutions before notifying the client of the failure.

          The Namenode's INodeFile class needs maintain additional metadata to indicate whether a file is a symbolic link (or an additional class could be introduced).

          * NB: It is possible to eliminate this additional RPC by piggy backing the link resolution on the notification.

          Future work

          This design should cover the necessary use cases however some of the above features (like enhancing archives) may be deferred.

          Related future work is implementing atomic rename within a file system by atomically re-writing a symbolic link.

          Show
          Eli Collins added a comment - Uploaded a first draft of a design doc (followed the template in HADOOP-5587 ). Content follows. Lemme know where I missed the boat. Thanks, Eli Symlinks Design Doc Problem definition HDFS path resolution has the following limitations: Files and directories are only accessible via a single path. An HDFS namespace may not span multiple file systems. Symbolic links address these limitations by providing an additional level of indirection when resolving paths in an HDFS file system. A symbolic link is a special type of file that contain a path to another file or directory. Paths may be relative or absolute. Relative paths (eg ../user ) provide an alternate path to a single file or directory in the file system. An absolute path may be relative to the current file system (eg /user ) or specify a URL (eg hdfs://localhost:8020/foo ) which allows the link to point to any file or directory irrespective of the source and destination file systems. Allowing multiple paths to resolve to the same file or directory, and HDFS namespaces to span multiple file systems makes it easier to access files and manage their underlying stoarge. Use Cases If an application requires data be available by a particular path a symlink may be used in lieu of copying the data from its current location. For example, a user may want to create a symlink /data/latest that points to an existing directory so that the latest data is accessible via it's current name and an alias, eg: $ hadoop fs -ln /data/20090922 /data/latest The user may eventually want to archive this data so that it's accessible but stored more efficiently. They could create an archive of the files in the 20090922 directory and make the original path a symlink to the HAR, eg: $ hadoop fs -ln har:///data/20090922.har /data/20090922 They could also move the directory to another file system that is perhaps lightly loaded or less expensive and make the existing directory a symlink, eg: $ hadoop fs -ln hdfs://archive-host/data/20090922 /data/20090922 The archival file system could also be accessible via an alternative protocol (eg FTP). In both cases the original data has moved but remains accessible by its original path. This technique can be used generally to balance storage by transparently making a namespace span multiple file systems. For example, if a particular subtree of a namespace outgrows the capabilities of the file system it resides on (eg Namenode performance, number of files, etc) it it can be moved to a new file system and linked into its current path. A symbolic link is also a useful primitive that could be used to implement atomic rename within a file system by atomically rewriting a symbolic link, or to rename a file across partitions (if in the future a file system metadata is partitioned across multiple hosts). See HADOOP-6240 for more info. Interaction with the Current System The user may interact with symbolic links via the shell or indirectly via applications (eg libhdfs clients like fuse mounts). Symbolic links are transparent to most operations though some may want to handle links specially. In general, the behavior should match POSIX where appropriate. Note that linking across file systems is somehwat equivalent to creating a symlink across mount points in POSIX. Path resolution: Some commands operate on the link directly (eg stat, rm) if the link is the target. Some commands (eg mv) should operate on the link target if a trailing slash is used (eg if /bar is a link that points to a directory, mv /foo /bar renames bar to foo while mv /foo /bar moves /foo into the directory pointed to by bar). Symbolic links in archive URIs should fully resolve. Some APIs should operate on the link target (eg setting access and modification times). Permissions : access control properties of links are ignored, checks are always performed against the link target (if it resides in an HDFS file system). The ch* operations should operate directly on the target. Some utilities need to be link-aware : distcp should not follow links by default. fsck should only look at each file once, could optionally report dangling links. Symbolic links in HARs should be followed (so that a symlink to a HAR preserves the original path resolution behavior). Symbolic links exist independently of their targets, and may point to non-existing files or directories. Requirements Clients send the entire path to the Namenode for resolution. The path may contain multiple links: If all links in the path are relative to the current file system then the Namenode transparently (to the client) resolves the path. If the Namenode finds a link in the path that points outside the file system it must provide API(s) to report to the client that (a) it can not resolve a path (due to a link that points outside the file system) and (b) return the target of the link and the remainder of the path that still needs to be resolved. Symbolic links should be largely invisible to users of the client. However, symbolic links may introduce cycles into path resolution. For example, a link may point to another URI (on another file system) which points back to the link. Loops should be avoided by having the client limit the number of links it will traverse, and report to its user that the operation was not successful. Symbolic links should not introduce significant overhead in the common case, resolving paths without links. Resolving symbolic links may be a frequent operation. If links are being used to transparently span a namespace across multiple file systems then the "root" Namenode may do little work aside from resolving link paths. Therefore, link resolution should have reasonable performance overhead and limited side-effects on both the client and Namenode. Design One approach is to have the client first stat the file to see if it is a symbolic link and resolve the path. Once the path is fully resolved (this may require contacting additional file systems) the client performs the desired operation. This introduces an additional RPC in the common case (when links are not present) so an optimization is to optimistically perform the operation first. If it was unsucessful due to the presence of an external link the Namenode notifies the client, using an exception, which is caught at the FileSystem level (since the link refers to another file system). The client then makes an additional call to the Namenode to resolve the link.* Once the path is fully resolved the operation can be re-tried. Note that if the operation contained multiple paths each may contain links which must be fully resolved before the operation can complete. This may require additional round trips. The call to resolve a link may fail if the link has been deleted, or is no longer a link, etc. In this case the resulting exception is passed upwards as if the original operation failed. As stated above, link resolution at the FileSystem level will perform a limited number of link resolutions before notifying the client of the failure. The Namenode's INodeFile class needs maintain additional metadata to indicate whether a file is a symbolic link (or an additional class could be introduced). * NB: It is possible to eliminate this additional RPC by piggy backing the link resolution on the notification. Future work This design should cover the necessary use cases however some of the above features (like enhancing archives) may be deferred. Related future work is implementing atomic rename within a file system by atomically re-writing a symbolic link.
          Hide
          Sanjay Radia added a comment -

          Symlinks to .. may be tricky to make work as we fold dotDots when a path is constructed. ie. Path("foo/bar/../xxx") becomes "foo/xxx".
          Good news is that initial dotDot (as in "../foo/bar") is left alone.
          Dhruba, you may want to watch out for this case in your code. DFSUtil#isValidPath() checks for dotDots as being invalid.

          Show
          Sanjay Radia added a comment - Symlinks to .. may be tricky to make work as we fold dotDots when a path is constructed. ie. Path("foo/bar/../xxx") becomes "foo/xxx". Good news is that initial dotDot (as in "../foo/bar") is left alone. Dhruba, you may want to watch out for this case in your code. DFSUtil#isValidPath() checks for dotDots as being invalid.
          Hide
          Eli Collins added a comment -

          Thanks Dhruba.

          Show
          Eli Collins added a comment - Thanks Dhruba.
          Hide
          Eli Collins added a comment -

          symlink15.txt patches rebased against trunk post project split. Have not been merged in the last couple days so have conflicts with recent FileContext changes.

          Show
          Eli Collins added a comment - symlink15.txt patches rebased against trunk post project split. Have not been merged in the last couple days so have conflicts with recent FileContext changes.
          Hide
          dhruba borthakur added a comment -

          Hi Doug/Eli, if you have already split up the existing patch into three parts, (common, mapred, hdfs), can you pl post them to the JIRA? I am then implement the latest changes proposed and post a new patch.

          Show
          dhruba borthakur added a comment - Hi Doug/Eli, if you have already split up the existing patch into three parts, (common, mapred, hdfs), can you pl post them to the JIRA? I am then implement the latest changes proposed and post a new patch.
          Hide
          Konstantin Shvachko added a comment -

          > ... getLinkTarget(/a/b/c) would return foo:/z/c. Is that right?

          Exactly.

          > Konstantin, will you help implement (IV) for 0.21?

          I would be glad to implement this, but I am currently preoccupied with append. May be Dhruba wants to convert his patch to (IV), then append and symlinks can go in parallel. If not I will be available beginning of October.

          Show
          Konstantin Shvachko added a comment - > ... getLinkTarget(/a/b/c) would return foo:/z/c. Is that right? Exactly. > Konstantin, will you help implement (IV) for 0.21? I would be glad to implement this, but I am currently preoccupied with append. May be Dhruba wants to convert his patch to (IV), then append and symlinks can go in parallel. If not I will be available beginning of October.
          Hide
          Doug Cutting added a comment -

          Konstantin, will you help implement (IV) for 0.21? Dhruba's current patch implements (III). Eli volunteered update that for the project split add a design doc & tests, but converting it to (IV) is a larger task. Perhaps Eli is willing to attempt this too, I don't know, but he is less facile with HDFS than you, and such a substantial change at such a late date might be a stretch.

          Show
          Doug Cutting added a comment - Konstantin, will you help implement (IV) for 0.21? Dhruba's current patch implements (III). Eli volunteered update that for the project split add a design doc & tests, but converting it to (IV) is a larger task. Perhaps Eli is willing to attempt this too, I don't know, but he is less facile with HDFS than you, and such a substantial change at such a late date might be a stretch.
          Hide
          Doug Cutting added a comment -

          > we don't know which part of the path is a link.

          Under (IV), if /a/b is an external link to foo:/z, then getFileStatus(/a/b/c) would throw a MountPointException, while getLinkTarget(/a/b/c) would return foo:/z/c. Is that right?

          Show
          Doug Cutting added a comment - > we don't know which part of the path is a link. Under (IV), if /a/b is an external link to foo:/z, then getFileStatus(/a/b/c) would throw a MountPointException, while getLinkTarget(/a/b/c) would return foo:/z/c. Is that right?
          Hide
          Sanjay Radia added a comment -

          Konstantin> IV. Let the name-node throw MountPointException if the specified path contains an external symlink. The client catches MountPointException, calls a new method getLinkTarget() to resolve the link, and uses the resolved path further to perform the action - open() a file.

          Doug> I could live with this approach.

          I can live with the MountPointException and 2nd call to get the link info.
          Hard to believe that we did not find this middle ground earlier. So Doug you are okay with using MountPoint exception as long as it does not return the symlink info inside the exception?

          Agree with Doug that getStatus() should suffice for the 2nd call.

          Show
          Sanjay Radia added a comment - Konstantin> IV. Let the name-node throw MountPointException if the specified path contains an external symlink. The client catches MountPointException, calls a new method getLinkTarget() to resolve the link, and uses the resolved path further to perform the action - open() a file. Doug> I could live with this approach. I can live with the MountPointException and 2nd call to get the link info. Hard to believe that we did not find this middle ground earlier. So Doug you are okay with using MountPoint exception as long as it does not return the symlink info inside the exception? Agree with Doug that getStatus() should suffice for the 2nd call.
          Hide
          Konstantin Shvachko added a comment -

          Yes (IV) is an optimization of (I).
          I first thought getStatus() is the same as getLinkTarget(), but may be it's not, because we don't know which part of the path is a link.
          If the link is in the middle of the src path, then having getFileStatus(src) return a FileStatus of the link inside instead of the status of the intended file may be confusing.
          So the idea of having a separate getLinkTarget() is that it can return the target of the link along with the portion of the path, which points to the link.

          Show
          Konstantin Shvachko added a comment - Yes (IV) is an optimization of (I). I first thought getStatus() is the same as getLinkTarget(), but may be it's not, because we don't know which part of the path is a link. If the link is in the middle of the src path, then having getFileStatus(src) return a FileStatus of the link inside instead of the status of the intended file may be confusing. So the idea of having a separate getLinkTarget() is that it can return the target of the link along with the portion of the path, which points to the link.
          Hide
          Doug Cutting added a comment -

          > a new method getLinkTarget() to resolve the link

          Isn't that just getStatus()? The status should include whether the file is a link and, if it is, the target of the link, no?

          Show
          Doug Cutting added a comment - > a new method getLinkTarget() to resolve the link Isn't that just getStatus()? The status should include whether the file is a link and, if it is, the target of the link, no?
          Hide
          Doug Cutting added a comment -

          Sanjay> Exceptions should match the layer of abstraction.

          I agree. The question is whether symbolic links are first-class data in the FileSystem SPI. If they are, then they ought to be returned as normal values, no?

          Konstantin> IV. Let the name-node throw MountPointException if the specified path contains an external symlink. The client catches MountPointException, calls a new method getLinkTarget() to resolve the link, and uses the resolved path further to perform the action - open() a file.

          I could live with this approach. It's an optimization of (1). Clients could always instead call getStatus() first, resolving links and avoid MountPointException by ensuring they only call open() on paths that are non-links. If we find that this optimization is insufficient then we could further optimize the API later to avoid a separate call for link resolution.

          Show
          Doug Cutting added a comment - Sanjay> Exceptions should match the layer of abstraction. I agree. The question is whether symbolic links are first-class data in the FileSystem SPI. If they are, then they ought to be returned as normal values, no? Konstantin> IV. Let the name-node throw MountPointException if the specified path contains an external symlink. The client catches MountPointException, calls a new method getLinkTarget() to resolve the link, and uses the resolved path further to perform the action - open() a file. I could live with this approach. It's an optimization of (1). Clients could always instead call getStatus() first, resolving links and avoid MountPointException by ensuring they only call open() on paths that are non-links. If we find that this optimization is insufficient then we could further optimize the API later to avoid a separate call for link resolution.
          Hide
          Sanjay Radia added a comment -

          Sorry for my delayed response on this Jira - I was occupied for the 0.21 release.

          Exceptions should match the layer of abstraction. Often an exception from a lower layer is converted to another more appropriate exception and passed up. Similarly one often consumes an exception thrown by a lower layer below. For example in a multi device storage system a storage device may throw an exception that it does not have any free space and a middle layer may consume that exception and reapply the operation at a different device.

          I see the FileSystem layer as being independent from their implementations such as a KFS server or a NN server. Each abstraction layer should stand on its own and use exceptions appropriate for that layer. From the NN node perspective, symlinks that cannot be followed such as those to KFS or to other NNs are an abnormal condition. Hence from the NN's perspective it is reasonable to use exceptions to convey this abnormal condition. Hence the RPC interface (i.e. NN's interface) can throw exceptions for foreign symbolic links. The Filesystem layer decides that it can process such an exception or convert and re-throw it as a FileNotFoundException.

          I have been very consistent in my comments ealier that my preference for exceptions is NOT because I am too lazy to use the alternate solution; use of exception is appropriate in this situation. Further if the alternate solution did not make the interfaces so ugly or if it simply affected one or two methods, I would have been able to accept the alternate solution.
          Our current interface is reasonably clean - very similar to other file systems and hence familiar to a FS user and developer. My objection is that the other approach makes the interface unnatural and its implementation harder to maintain.

          Show
          Sanjay Radia added a comment - Sorry for my delayed response on this Jira - I was occupied for the 0.21 release. Exceptions should match the layer of abstraction. Often an exception from a lower layer is converted to another more appropriate exception and passed up. Similarly one often consumes an exception thrown by a lower layer below. For example in a multi device storage system a storage device may throw an exception that it does not have any free space and a middle layer may consume that exception and reapply the operation at a different device. I see the FileSystem layer as being independent from their implementations such as a KFS server or a NN server. Each abstraction layer should stand on its own and use exceptions appropriate for that layer. From the NN node perspective, symlinks that cannot be followed such as those to KFS or to other NNs are an abnormal condition. Hence from the NN's perspective it is reasonable to use exceptions to convey this abnormal condition. Hence the RPC interface (i.e. NN's interface) can throw exceptions for foreign symbolic links. The Filesystem layer decides that it can process such an exception or convert and re-throw it as a FileNotFoundException. I have been very consistent in my comments ealier that my preference for exceptions is NOT because I am too lazy to use the alternate solution; use of exception is appropriate in this situation. Further if the alternate solution did not make the interfaces so ugly or if it simply affected one or two methods, I would have been able to accept the alternate solution. Our current interface is reasonably clean - very similar to other file systems and hence familiar to a FS user and developer. My objection is that the other approach makes the interface unnatural and its implementation harder to maintain.
          Hide
          eric baldeschwieler added a comment -

          Hi Folks,

          It seems folks position on this issue have not changed since the bug was first discussed many moons ago. Doug has objected to the use of exceptions since first proposed. Several committers have objected to the no exception based approach since it was first proposed.

          We seem to have no mechanism for resolving such disputes. It appears this feature has missed 21 as a result, which is a pity since everyone agrees that this would be a useful feature. Given the high degree of interest in the feature, I'd suggest that we can probably vote it into the 21 release in a week or two if we can get to a consensus on the design.

          How do we get passed the veto impasse?

          E14

          Show
          eric baldeschwieler added a comment - Hi Folks, It seems folks position on this issue have not changed since the bug was first discussed many moons ago. Doug has objected to the use of exceptions since first proposed. Several committers have objected to the no exception based approach since it was first proposed. We seem to have no mechanism for resolving such disputes. It appears this feature has missed 21 as a result, which is a pity since everyone agrees that this would be a useful feature. Given the high degree of interest in the feature, I'd suggest that we can probably vote it into the 21 release in a week or two if we can get to a consensus on the design. How do we get passed the veto impasse? E14
          Hide
          Konstantin Shvachko added a comment -

          Then it should be easy to propagate UnresolvedPathException from the patch to the FileSystem level?
          This will simplify the patch a lot.

          Show
          Konstantin Shvachko added a comment - Then it should be easy to propagate UnresolvedPathException from the patch to the FileSystem level? This will simplify the patch a lot.
          Show
          Doug Cutting added a comment - My position is the same as I stated in: https://issues.apache.org/jira/browse/HDFS-245?focusedCommentId=12638334&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12638334
          Hide
          Konstantin Shvachko added a comment -

          > The contract of a FileSystem implementation, from the outside, in the presence of links, is to map paths to either bytes or to an external link. How it chooses to implement that internally is a separate issue.

          If this means NameNode and DistributedFileSystem may throw an exception in case of paths with external links, which is handled on the FileSystem level, then that resolves the issue for me.

          Show
          Konstantin Shvachko added a comment - > The contract of a FileSystem implementation, from the outside, in the presence of links, is to map paths to either bytes or to an external link. How it chooses to implement that internally is a separate issue. If this means NameNode and DistributedFileSystem may throw an exception in case of paths with external links, which is handled on the FileSystem level, then that resolves the issue for me.
          Hide
          Konstantin Shvachko added a comment -

          > My personal preference is approach

          I am in favor of approach IV (four, 4, the one with MountPointException).
          Thanks for correcting, Eli.

          Show
          Konstantin Shvachko added a comment - > My personal preference is approach I am in favor of approach IV (four, 4, the one with MountPointException). Thanks for correcting, Eli.
          Hide
          Eli Collins added a comment -

          Hey Konstantin,

          Thanks for posting the overview. Looks good, though it's not just Doug who would prefer not using exceptions for normal control flow. But aside from the issue of the use of exceptions is there consensus, eg on the user-visible behavior of the latest patches? If so, sorry, wasn't clear from the comments.

          > There were no explanations on the fact of reassignment of this jira. Eli, are you committing to support this
          > patch or do you see your mission to provide documentation and testing?

          Apologies for the missing context. Dhruba hasn't had the cycles to close the jira and I offered to help out. It wasn't clear if people were OK with the latest patches so I'll work on working on finishing them up so they're acceptable to people. Yes, I'm committed to supporting the feature.

          > My personal preference is approach VI.

          Do you mean approach V?

          It seems the use of symbolic links will be a common case (eg you can imagine them being used ala google namespaces) and therefore it would be nice to avoid multiple round trips to the NN.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Konstantin, Thanks for posting the overview. Looks good, though it's not just Doug who would prefer not using exceptions for normal control flow. But aside from the issue of the use of exceptions is there consensus, eg on the user-visible behavior of the latest patches? If so, sorry, wasn't clear from the comments. > There were no explanations on the fact of reassignment of this jira. Eli, are you committing to support this > patch or do you see your mission to provide documentation and testing? Apologies for the missing context. Dhruba hasn't had the cycles to close the jira and I offered to help out. It wasn't clear if people were OK with the latest patches so I'll work on working on finishing them up so they're acceptable to people. Yes, I'm committed to supporting the feature. > My personal preference is approach VI. Do you mean approach V? It seems the use of symbolic links will be a common case (eg you can imagine them being used ala google namespaces) and therefore it would be nice to avoid multiple round trips to the NN. Thanks, Eli
          Hide
          Doug Cutting added a comment -

          > Doug's counter argument to this is that the whole HDFS is a single monolithic layer, which should handle symlinks and therefore this is not an exceptional condition for any sub-layers of HDFS.

          That's true. But the point I feel more strongly about is that, in the SPI for FileSystem implementations (local, HDFS, S3, etc.), links are normal. If HDFS uses exceptions internally for this that's not great. (It actually can't easily use exceptions across RPCs anyway since Hadoop's RPC mechanism has poor support for exceptions and should probably catch them serverside, bundle them into a union-like datastructure and re-throw them on the other side.) But for HDFS to communicate the presence of an external link to the generic FileSystem layer through an exception is much worse. The FileSystem API is arguably Hadoop's most central API and should be held to a very high standard. The contract of a FileSystem implementation, from the outside, in the presence of links, is to map paths to either bytes or to an external link. How it chooses to implement that internally is a separate issue. Exceptions are a poor choice at either level, but I have a much harder time tolerating them at the upper level.

          Show
          Doug Cutting added a comment - > Doug's counter argument to this is that the whole HDFS is a single monolithic layer, which should handle symlinks and therefore this is not an exceptional condition for any sub-layers of HDFS. That's true. But the point I feel more strongly about is that, in the SPI for FileSystem implementations (local, HDFS, S3, etc.), links are normal. If HDFS uses exceptions internally for this that's not great. (It actually can't easily use exceptions across RPCs anyway since Hadoop's RPC mechanism has poor support for exceptions and should probably catch them serverside, bundle them into a union-like datastructure and re-throw them on the other side.) But for HDFS to communicate the presence of an external link to the generic FileSystem layer through an exception is much worse. The FileSystem API is arguably Hadoop's most central API and should be held to a very high standard. The contract of a FileSystem implementation, from the outside, in the presence of links, is to map paths to either bytes or to an external link. How it chooses to implement that internally is a separate issue. Exceptions are a poor choice at either level, but I have a much harder time tolerating them at the upper level.
          Hide
          Eli Collins added a comment -

          I think well designed APIs don't require catching exceptions for normal control flow. My guess is that they chose to throw a SecurityException because they wanted to make failing permission checks impossible to ignore (you have to either catch or rethrow) where as it's much easier to ignore a function return value, leading to a security flaw. Perhaps that's the right trade off in this case. Also perhaps continuing to throw an exception is an AccessControlException is reasonable given that it's the convention for a library used extensively. I also don't consider java.lang to be the final word on what's best for a given system (just look how it's evolved), you have to consider the trade off. For example, if exceptions turn out to be pretty slow that may not be a big deal for SecurityException since it's not thrown frequently in the common case, but not at all good for UnresolvedPathException since it may be thrown heavily (eg if the root NN has nothing but symlinks to other NNs).

          So in this case, and in general, I think the above is a good api design principle. There are real reasons why exceptions are not appropriate for normal control flow, the tradeoffs need to be considered for the given case. In this jira the only argument I read for throwing the exception is that it means not modifying a lot of function prototypes, which, to me, does not justify bending a good principle.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - I think well designed APIs don't require catching exceptions for normal control flow. My guess is that they chose to throw a SecurityException because they wanted to make failing permission checks impossible to ignore (you have to either catch or rethrow) where as it's much easier to ignore a function return value, leading to a security flaw. Perhaps that's the right trade off in this case. Also perhaps continuing to throw an exception is an AccessControlException is reasonable given that it's the convention for a library used extensively. I also don't consider java.lang to be the final word on what's best for a given system (just look how it's evolved), you have to consider the trade off. For example, if exceptions turn out to be pretty slow that may not be a big deal for SecurityException since it's not thrown frequently in the common case, but not at all good for UnresolvedPathException since it may be thrown heavily (eg if the root NN has nothing but symlinks to other NNs). So in this case, and in general, I think the above is a good api design principle. There are real reasons why exceptions are not appropriate for normal control flow, the tradeoffs need to be considered for the given case. In this jira the only argument I read for throwing the exception is that it means not modifying a lot of function prototypes, which, to me, does not justify bending a good principle. Thanks, Eli
          Hide
          Konstantin Shvachko added a comment -

          There were no explanations on the fact of reassignment of this jira. Eli, are you committing to support this patch or do you see your mission to provide documentation and testing?

          Show
          Konstantin Shvachko added a comment - There were no explanations on the fact of reassignment of this jira. Eli, are you committing to support this patch or do you see your mission to provide documentation and testing?
          Hide
          Konstantin Shvachko added a comment -

          My opinion as I expressed it previously in this jira:

          • HDFS is not a single monolithic layer. It consists of several abstract layers: name-node, client, FileSystem.
            Most of them, up to the top one, do not have a clue what to do with a link to an external file system, and therefore should treat it as an exceptional condition and throw a Checked exception.
          • The current exception-less implementation substantially complicates (pollutes) the file system api through all of its layers.
            Speaking as a developer, supporting this code will be a tough effort. Understandability of the code suffers as well.
          • My personal preference is approach VI. I think Checked exception is exactly what should be used here, but passing any data inside the exception is a bad practice. Performance-wise, paying the price of an extra RPC call to the name-node ONLY when the path REALLY contains the external link is justifiable.
          Show
          Konstantin Shvachko added a comment - My opinion as I expressed it previously in this jira: HDFS is not a single monolithic layer. It consists of several abstract layers: name-node, client, FileSystem. Most of them, up to the top one, do not have a clue what to do with a link to an external file system, and therefore should treat it as an exceptional condition and throw a Checked exception. The current exception-less implementation substantially complicates (pollutes) the file system api through all of its layers. Speaking as a developer, supporting this code will be a tough effort. Understandability of the code suffers as well. My personal preference is approach VI. I think Checked exception is exactly what should be used here, but passing any data inside the exception is a bad practice. Performance-wise, paying the price of an extra RPC call to the name-node ONLY when the path REALLY contains the external link is justifiable.
          Hide
          Konstantin Shvachko added a comment -

          Alan asked for the summary. Here is the one before history takes a new turn in this issue. Tried to be objective.

          Overview of the SymLink proposal.

          The SymLink proposal HDFS-245 introduces symbolic links in HDFS, which can point to files or directories in the same (internal link) or a different (external link) files systems.

          1. Internal link example /users/shv/test
          2. External link example hdfs://nn1.yahoo.com:50030/users/shv/data
            External links may be also viewed as mount points of other Hadoop file system(s), those that implement FileSystem API.

          Internal links can be resolved inside the name-node, there is no contradiction here.

          External links should be returned back to an HDFS client, which should connect to the linked file system on a different cluster and get information from there. The controversy here is how the information about the external symbolic links should be returned to and handled by the client.
          Note that such handling can be only done at the top FileSystem level, because lower level abstractions represent a single Hadoop file system.

          The following major approaches have been considered.

          I. Replace each call to the name-node with two: the first resolves the input path if it is a symbolic link or does nothing if the path is local, the second performs the required action, like open() a file.
          II. Each name-node operation with a path parameter throws an UnresolvedLinkException if the path crosses an external link (a mount point). The exception contains a new path pointed to by the symlink, which the client uses to repeat the call with a different file system.
          III. Every call to the name-node returns a result, which incorporates a potential symlink resolution, if any, in addition to the actual return type of the method. Thus the result of a method is sort of a union of a potential link and the real type. See examples below.
          IV. Let the name-node throw MountPointException if the specified path contains an external symlink. The client catches MountPointException, calls a new method getLinkTarget() to resolve the link, and uses the resolved path further to perform the action - open() a file.
          V. Similar to (III) but the potential symlink resolution is hidden in a thread local variable.

          Most people agreed that (I) doubles the access time in the regular case, when there are no symlinks. Doug proposed at some point to benchmark it before ruling it out, because the name-node is fast based on the latest benchmarks.
          (II), (III) and (V) do not require extra RPC calls.
          (IV) does not require extra RPCs in the regular (no symlinks) case, but does require to resolve external symlinks explicitly if one is on the path.

          The latest patch submitted by Dhruba implements (III). There were several iterations over this. The latest API looks like this:

          FsResult<LocatedBlocks> getBlockLocations(path, ...) throws IOException;
          FsResult<FileStatus> getFileInfo(String src) throws IOException;
          FsResult<BooleanWritable> delete(String src, boolean recursive) throws IOException;
          FsResult<NullWritable> setOwner(String src, ...) throws IOException;
          

          This is much better than the starting patch, which was defining a separate union-class for every return type, but still, replacing a simple void type with FsResult<NullWritable> and Boolean with FsResult<BooleanWritable> looks controversial.

          • Doug opposes (II) and (IV) on the basis that "Exceptions should not be used for normal program control."
          • He refers to ["Best Practices for Exception Handling":http://onjava.com/pub/a/onjava/2003/11/19/exceptions.html]
            This lists just three cases where exceptions are appropriate: programming errors, client code errors, and resource failures, which the links are clearly not.
          • I, Sanjay, et al. argue that this and other sources mention and accept usage of so called Checked exceptions, which "represent invalid conditions in areas outside the immediate control of the program". UnresolvedLinkException and MountPointException fall under this category, because the name-node cannot immediately handle external symlinks being unaware of other clusters.
          • Doug's counter argument to this is that the whole HDFS is a single monolithic layer, which should handle symlinks and therefore this is not an exceptional condition for any sub-layers of HDFS.
          Show
          Konstantin Shvachko added a comment - Alan asked for the summary. Here is the one before history takes a new turn in this issue. Tried to be objective. Overview of the SymLink proposal. The SymLink proposal HDFS-245 introduces symbolic links in HDFS, which can point to files or directories in the same (internal link) or a different (external link) files systems. Internal link example /users/shv/test External link example hdfs://nn1.yahoo.com:50030/users/shv/data External links may be also viewed as mount points of other Hadoop file system(s), those that implement FileSystem API. Internal links can be resolved inside the name-node, there is no contradiction here. External links should be returned back to an HDFS client, which should connect to the linked file system on a different cluster and get information from there. The controversy here is how the information about the external symbolic links should be returned to and handled by the client. Note that such handling can be only done at the top FileSystem level, because lower level abstractions represent a single Hadoop file system. The following major approaches have been considered. I. Replace each call to the name-node with two: the first resolves the input path if it is a symbolic link or does nothing if the path is local, the second performs the required action, like open() a file. II. Each name-node operation with a path parameter throws an UnresolvedLinkException if the path crosses an external link (a mount point). The exception contains a new path pointed to by the symlink, which the client uses to repeat the call with a different file system. III. Every call to the name-node returns a result, which incorporates a potential symlink resolution, if any, in addition to the actual return type of the method. Thus the result of a method is sort of a union of a potential link and the real type. See examples below. IV. Let the name-node throw MountPointException if the specified path contains an external symlink. The client catches MountPointException, calls a new method getLinkTarget() to resolve the link, and uses the resolved path further to perform the action - open() a file. V. Similar to (III) but the potential symlink resolution is hidden in a thread local variable. Most people agreed that (I) doubles the access time in the regular case, when there are no symlinks. Doug proposed at some point to benchmark it before ruling it out, because the name-node is fast based on the latest benchmarks. (II), (III) and (V) do not require extra RPC calls. (IV) does not require extra RPCs in the regular (no symlinks) case, but does require to resolve external symlinks explicitly if one is on the path. The latest patch submitted by Dhruba implements (III). There were several iterations over this. The latest API looks like this: FsResult<LocatedBlocks> getBlockLocations(path, ...) throws IOException; FsResult<FileStatus> getFileInfo( String src) throws IOException; FsResult<BooleanWritable> delete( String src, boolean recursive) throws IOException; FsResult<NullWritable> setOwner( String src, ...) throws IOException; This is much better than the starting patch, which was defining a separate union-class for every return type, but still, replacing a simple void type with FsResult<NullWritable> and Boolean with FsResult<BooleanWritable> looks controversial. Doug opposes (II) and (IV) on the basis that "Exceptions should not be used for normal program control." He refers to ["Best Practices for Exception Handling":http://onjava.com/pub/a/onjava/2003/11/19/exceptions.html] This lists just three cases where exceptions are appropriate: programming errors, client code errors, and resource failures, which the links are clearly not. I, Sanjay, et al. argue that this and other sources mention and accept usage of so called Checked exceptions, which "represent invalid conditions in areas outside the immediate control of the program". UnresolvedLinkException and MountPointException fall under this category, because the name-node cannot immediately handle external symlinks being unaware of other clusters. Doug's counter argument to this is that the whole HDFS is a single monolithic layer, which should handle symlinks and therefore this is not an exceptional condition for any sub-layers of HDFS.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Currently for example checkPermission returns void but throws AccessControlException ...

          Hi Eli, do you think that all checkXxx methods in java.lang.SecurityManager are badly designed because it throws exceptions for normal control flows?

          Show
          Tsz Wo Nicholas Sze added a comment - > Currently for example checkPermission returns void but throws AccessControlException ... Hi Eli, do you think that all checkXxx methods in java.lang.SecurityManager are badly designed because it throws exceptions for normal control flows?
          Hide
          Eli Collins added a comment -

          My mentioned C to say it's possible to write clean code that returns values via normal function call unwinding.

          Does anyone still think we should be using exceptions for normal control flow?

          Currently for example checkPermission returns void but throws AccessControlException and UnresolvedPathException. Both represent normal control flow (ie if the user is using access control or symlinks). How about instead we return a value (eg similar to FsResult) to indicate the status of the operation? I realize that will be a sizable diff, it doesn't necessarily have to all go in with symlinks, would be great if we could stage it.

          Btw I'll post a design doc soon which attempts to capture the jira so far and follow up with a test plan. Want to make sure there's consensus there since the discussion has spread several months and there's still open questions.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - My mentioned C to say it's possible to write clean code that returns values via normal function call unwinding. Does anyone still think we should be using exceptions for normal control flow? Currently for example checkPermission returns void but throws AccessControlException and UnresolvedPathException. Both represent normal control flow (ie if the user is using access control or symlinks). How about instead we return a value (eg similar to FsResult) to indicate the status of the operation? I realize that will be a sizable diff, it doesn't necessarily have to all go in with symlinks, would be great if we could stage it. Btw I'll post a design doc soon which attempts to capture the jira so far and follow up with a test plan. Want to make sure there's consensus there since the discussion has spread several months and there's still open questions. Thanks, Eli
          Hide
          Joydeep Sen Sarma added a comment -

          the left side of my brain says don't comment, don't comment ..

          the right side says the comparison to C is entirely inappropriate. There are so many places in Java (everywhere!) where exceptions are thrown where in C one would have a return code. Integer.parseInt and NumberFormatException versus atoi anyone? There may be other ways of reasoning about this - but not comparison to C ..

          there i did it. stupid right side again.

          Show
          Joydeep Sen Sarma added a comment - the left side of my brain says don't comment, don't comment .. the right side says the comparison to C is entirely inappropriate. There are so many places in Java (everywhere!) where exceptions are thrown where in C one would have a return code. Integer.parseInt and NumberFormatException versus atoi anyone? There may be other ways of reasoning about this - but not comparison to C .. there i did it. stupid right side again.
          Hide
          Konstantin Boudnik added a comment -

          I'd discussed the article from Doug's point with some folks from Hotspot team and what I hear is basically this:

          • using exceptions for control flow is bad idea (doh!)
          • exceptions are slow enough although it's getting better because of relatively cheap object creation

          Also, Exceptions are tend to carry on a handful of information about internal states of a system which might get exposed by a devoted 'user'.

          Show
          Konstantin Boudnik added a comment - I'd discussed the article from Doug's point with some folks from Hotspot team and what I hear is basically this: using exceptions for control flow is bad idea (doh!) exceptions are slow enough although it's getting better because of relatively cheap object creation Also, Exceptions are tend to carry on a handful of information about internal states of a system which might get exposed by a devoted 'user'.
          Hide
          Philip Zeyliger added a comment -

          I'm not up to date on the current lore of exceptions performance, but Java Exceptions tend to carry around stack frames of where they were thrown (Throwable has a handful of fields including StackTraceElement[]), and those are certainly heavier than any return value object we might construct.

          Show
          Philip Zeyliger added a comment - I'm not up to date on the current lore of exceptions performance, but Java Exceptions tend to carry around stack frames of where they were thrown (Throwable has a handful of fields including StackTraceElement[]), and those are certainly heavier than any return value object we might construct.
          Hide
          Doug Cutting added a comment -

          > Also, is there any performance difference between throwing an exception vs returning a status code?

          Java exceptions are substantially slower than normal returns.

          I can't find a lot of benchmarks, but here's one that says its 66x slower!

          http://stackoverflow.com/questions/299068/how-slow-are-java-exceptions/299315

          Show
          Doug Cutting added a comment - > Also, is there any performance difference between throwing an exception vs returning a status code? Java exceptions are substantially slower than normal returns. I can't find a lot of benchmarks, but here's one that says its 66x slower! http://stackoverflow.com/questions/299068/how-slow-are-java-exceptions/299315
          Hide
          dhruba borthakur added a comment -

          I was of the opinion that using Exceptions keeps the code cleaner and easy to read. But you raise a good question "what if HDFS was written in C, would we use setjmp/longjmp"? Definitely no. I am beginning to come round to the fact that the code looks cleaner when we use Exceptions just because of the deficiencies of the current API.

          Also, is there any performance difference between throwing an exception vs returning a status code?

          Show
          dhruba borthakur added a comment - I was of the opinion that using Exceptions keeps the code cleaner and easy to read. But you raise a good question "what if HDFS was written in C, would we use setjmp/longjmp"? Definitely no. I am beginning to come round to the fact that the code looks cleaner when we use Exceptions just because of the deficiencies of the current API. Also, is there any performance difference between throwing an exception vs returning a status code?
          Hide
          Eli Collins added a comment -

          Also Sanjay made the good point several times above that FS and the NN shouldn't be inconsistent wrt to their use of exceptions, ie UnresolvedPathException is not a good use of exceptions for the same reason--it's possible to write clean code that returns values via normal undwinding (ie if the NN were written in C I don't think anyone would be arguing that we usesetjmp/longjump here).

          Show
          Eli Collins added a comment - Also Sanjay made the good point several times above that FS and the NN shouldn't be inconsistent wrt to their use of exceptions, ie UnresolvedPathException is not a good use of exceptions for the same reason--it's possible to write clean code that returns values via normal undwinding (ie if the NN were written in C I don't think anyone would be arguing that we usesetjmp/longjump here).
          Hide
          Eli Collins added a comment -

          +1 for not using exceptions.

          My experience is, and to quote Josh Bloch, "a well-designed API must not force its clients to use exceptions for ordinary control flow". Resolving links in HDFS paths, regardless of the component it's implemented in, will definitely be considered normal control flow (as opposed to eg something that causes an IOException).

          The case for exceptions being made in this jira is that the return type of the current interface doesn't allow for any case besides returning eg a block report, therefore exceptions are a low-impact way to propagate this new return type. This deficiency in the current API seems to be a good reason for changing it rather than a motivation for working around it using exceptions. Whether the resulting patch is too invasive for 21 is a very reasonable concern, but hopefully orthogonal.

          Show
          Eli Collins added a comment - +1 for not using exceptions. My experience is, and to quote Josh Bloch, "a well-designed API must not force its clients to use exceptions for ordinary control flow". Resolving links in HDFS paths, regardless of the component it's implemented in, will definitely be considered normal control flow (as opposed to eg something that causes an IOException). The case for exceptions being made in this jira is that the return type of the current interface doesn't allow for any case besides returning eg a block report, therefore exceptions are a low-impact way to propagate this new return type. This deficiency in the current API seems to be a good reason for changing it rather than a motivation for working around it using exceptions. Whether the resulting patch is too invasive for 21 is a very reasonable concern, but hopefully orthogonal.
          Hide
          Philip Zeyliger added a comment -

          In my opinion (I'd say IME, but I've never sneaked one of these past code review),using exceptions for flow control is shady. Joshua Bloch would say the same ("Don't force client to use exceptions for control flow" see page 38, http://74.125.155.132/search?q=cache:MmIXuflmcvcJ:lcsd05.cs.tamu.edu/slides/keynote.pdf+joshua+bloch+exceptions&cd=4&hl=en&ct=clnk&gl=us&client=firefox-a or http://lcsd05.cs.tamu.edu/slides/keynote.pdf ).

          If the worry is about boilerplate, it seems like this exception
          would have to be checked — otherwise, people would not know that
          this exception is part of what they have to handle and implement —
          and nothing introduces misunderstood boilerplate quite as fast as
          checked exceptions.

          I can't think of many analogous interfaces here; the lack
          of which probably makes this argument more contentious.
          HTTP is vaguely similar: there's sometimes a full response,
          and sometimes there's a redirection. In most APIs wrapping
          HTTP that I've seen, you get a response object in response
          to a request. TCP/IP errors (socket timeouts, name resolution, etc.)
          throw exceptions, but nothing that has to do with HTTP does.
          One inspects response.status to see if it's 200 (ok), 301 (redirect), etc.

          – Philip

          Show
          Philip Zeyliger added a comment - In my opinion (I'd say IME, but I've never sneaked one of these past code review),using exceptions for flow control is shady. Joshua Bloch would say the same ("Don't force client to use exceptions for control flow" see page 38, http://74.125.155.132/search?q=cache:MmIXuflmcvcJ:lcsd05.cs.tamu.edu/slides/keynote.pdf+joshua+bloch+exceptions&cd=4&hl=en&ct=clnk&gl=us&client=firefox-a or http://lcsd05.cs.tamu.edu/slides/keynote.pdf ). If the worry is about boilerplate, it seems like this exception would have to be checked — otherwise, people would not know that this exception is part of what they have to handle and implement — and nothing introduces misunderstood boilerplate quite as fast as checked exceptions. I can't think of many analogous interfaces here; the lack of which probably makes this argument more contentious. HTTP is vaguely similar: there's sometimes a full response, and sometimes there's a redirection. In most APIs wrapping HTTP that I've seen, you get a response object in response to a request. TCP/IP errors (socket timeouts, name resolution, etc.) throw exceptions, but nothing that has to do with HTTP does. One inspects response.status to see if it's 200 (ok), 301 (redirect), etc. – Philip
          Hide
          Doug Cutting added a comment -

          The most succinct test I've come up with for when it's appropriate to use exceptions is whether the target is known. If you know exactly what class and where on the stack an exception will be caught, then it's just a glorified goto statement working around poor API modeling. If you genuinely have no idea how a situation should be handled, then an exception is appropriate.

          In this case the exception will always be caught by FileSytem/FileContext and always be handled in precisely the same way by the same boilerplate code. There's nothing ambiguous about this situation that justifies exceptions.

          That's the negative case. The postive case is that we are adding the notion of symlinks to our FileSystem API, including cross-FileSystem symlinks. Thus, a FileSystem becomes not just fundamentally a mapping of path->bytes, but a mapping of path->[bytes|path]. This is a fundamental change to our FileSystem metaphor, and, as such, it requires a fundamental change to its SPI. The SPI can and should model this functionality directly with parameters and return values, not through values passed in exceptions.

          > It is a shame that this isn't looking like it will make 0.21's cutoff.

          I agree that it would be shameful if this does not make it into 0.21. The patch has been here for 9 months!

          Show
          Doug Cutting added a comment - The most succinct test I've come up with for when it's appropriate to use exceptions is whether the target is known. If you know exactly what class and where on the stack an exception will be caught, then it's just a glorified goto statement working around poor API modeling. If you genuinely have no idea how a situation should be handled, then an exception is appropriate. In this case the exception will always be caught by FileSytem/FileContext and always be handled in precisely the same way by the same boilerplate code. There's nothing ambiguous about this situation that justifies exceptions. That's the negative case. The postive case is that we are adding the notion of symlinks to our FileSystem API, including cross-FileSystem symlinks. Thus, a FileSystem becomes not just fundamentally a mapping of path->bytes, but a mapping of path-> [bytes|path] . This is a fundamental change to our FileSystem metaphor, and, as such, it requires a fundamental change to its SPI. The SPI can and should model this functionality directly with parameters and return values, not through values passed in exceptions. > It is a shame that this isn't looking like it will make 0.21's cutoff. I agree that it would be shameful if this does not make it into 0.21. The patch has been here for 9 months!
          Hide
          Owen O'Malley added a comment -

          I'd like to bring up the issue again of throwing exceptions between the FileSystem implementations to notify the framework that there is a link to an external resource. That seems a lot cleaner than making every single method return a variant type that is sometimes the file handle (or other outcome of the method) and sometimes an address of the new path to look in. If the SPI always returned link info, I would lean the other way. But since the SPI does do a dereference of the link, except in the case of an external link, leads me to like the use of the exception as a mechanism to signal the condition.

          It is a shame that this isn't looking like it will make 0.21's cutoff.

          Show
          Owen O'Malley added a comment - I'd like to bring up the issue again of throwing exceptions between the FileSystem implementations to notify the framework that there is a link to an external resource. That seems a lot cleaner than making every single method return a variant type that is sometimes the file handle (or other outcome of the method) and sometimes an address of the new path to look in. If the SPI always returned link info, I would lean the other way. But since the SPI does do a dereference of the link, except in the case of an external link, leads me to like the use of the exception as a mechanism to signal the condition. It is a shame that this isn't looking like it will make 0.21's cutoff.
          Hide
          Allen Wittenauer added a comment -

          Can we get a summary of what those issues are rather than pouring through all of the comments?

          Show
          Allen Wittenauer added a comment - Can we get a summary of what those issues are rather than pouring through all of the comments?
          Hide
          eric baldeschwieler added a comment -

          Hi Folks,

          Expect more discussion here! I'll let others express their opinions, but just FYI, there are unresolved design issues here that folks on our HDFS team feel strongly about...

          E14

          Show
          eric baldeschwieler added a comment - Hi Folks, Expect more discussion here! I'll let others express their opinions, but just FYI, there are unresolved design issues here that folks on our HDFS team feel strongly about... E14
          Hide
          Eli Collins added a comment -

          Discussed with Dhruba. I'll start putting together a design doc to cover the above questions/comments which deserve some more discussion. Will also start on a test plan and user guide.

          I rebased the latest diffs against trunk and have patches for hdfs, common and mapreduce. I'll file jiras for the last two and update each with the relevant patch.

          Show
          Eli Collins added a comment - Discussed with Dhruba. I'll start putting together a design doc to cover the above questions/comments which deserve some more discussion. Will also start on a test plan and user guide. I rebased the latest diffs against trunk and have patches for hdfs, common and mapreduce. I'll file jiras for the last two and update each with the relevant patch.
          Hide
          weimin zhu added a comment -

          this a copy of symLink14.patch for hadoop released version 0.20.0

          usage:
          put symlink-0.20.0.patch into $HADOOP_HOME

          cd $HADOOP_HOME
          patch -p0 < symlink-0.20.0.patch
          ant jar
          mv hadoop-0.20.0-core.jar hadoop-0.20.0-core.jar.bak
          cp build/hadoop-0.20.1-dev-core.jar

          restart the hadoop system at last

          Show
          weimin zhu added a comment - this a copy of symLink14.patch for hadoop released version 0.20.0 usage: put symlink-0.20.0.patch into $HADOOP_HOME cd $HADOOP_HOME patch -p0 < symlink-0.20.0.patch ant jar mv hadoop-0.20.0-core.jar hadoop-0.20.0-core.jar.bak cp build/hadoop-0.20.1-dev-core.jar restart the hadoop system at last
          Hide
          Yajun Dong added a comment -

          @dhruba borthakur, Thanks for your comment.

          Can I ask whether this feature has been completed? I suppose the only thing left are the test plan, negative unit tests and performance test, right?

          What patch above Shall I used to test it, and by the way, which version you think this patch shall be merged into.

          Show
          Yajun Dong added a comment - @dhruba borthakur, Thanks for your comment. Can I ask whether this feature has been completed? I suppose the only thing left are the test plan, negative unit tests and performance test, right? What patch above Shall I used to test it, and by the way, which version you think this patch shall be merged into.
          Hide
          dhruba borthakur added a comment -

          Hi Yajun, this patch might not be part of 0.21 because I am sort on resources to write the test plan, negative unit tests, performance test as scale, etc. If somebody can volunteer to write the test plan, user guide and a few negative unit tests, that will be of great help (HDFS-254, HDFS-255).

          Show
          dhruba borthakur added a comment - Hi Yajun, this patch might not be part of 0.21 because I am sort on resources to write the test plan, negative unit tests, performance test as scale, etc. If somebody can volunteer to write the test plan, user guide and a few negative unit tests, that will be of great help ( HDFS-254 , HDFS-255 ).
          Hide
          Yajun Dong added a comment -

          Hi, how is the feature progress going, and what's time to complete this patch ?

          Show
          Yajun Dong added a comment - Hi, how is the feature progress going, and what's time to complete this patch ?
          Hide
          dhruba borthakur added a comment -

          The unit test TestLink was failing because the dfs.support.append was not set. Fixed.

          Show
          dhruba borthakur added a comment - The unit test TestLink was failing because the dfs.support.append was not set. Fixed.
          Hide
          dhruba borthakur added a comment -

          Patch merged with latest trunk. There are still some unit test failures.

          Show
          dhruba borthakur added a comment - Patch merged with latest trunk. There are still some unit test failures.
          Hide
          Edward Capriolo added a comment -

          I am using/helping the hadoop-hive subproject. I wanted to share a use case for symlinks.

          For example suppose a directory inside hadoop:
          /user/edward/weblogs/

          {web1.log,web2.log,web3.log}

          . I can use a Hive EXTERNAL
          table to point to the parent directory. I can then use Hive to query this external table. This is very powerful. This will work unless another file in this directory with a different format is also in the directory web_logsummary.csv. (this is my case)

          Being able to drop in a 'symlink' where a file would go could be used to create structures from already existing data. Imagine a user that has a large hadoop deployment and is wishing to migrate/ start using hive. External table is constrained to one directory. They would need to recode application paths and or move files. If you had a 'symlink' concept anyone can start using hive without re-organizing or copying data.

          Right now, hive has a lot of facilities to deal with input formats, such as specifying delimiters etc, but forcing the data either into a warehouse or into an external table is limiting. 'Symlinks' tied together with hive's current input format capabilities would make hive more versatile.

          Show
          Edward Capriolo added a comment - I am using/helping the hadoop-hive subproject. I wanted to share a use case for symlinks. For example suppose a directory inside hadoop: /user/edward/weblogs/ {web1.log,web2.log,web3.log} . I can use a Hive EXTERNAL table to point to the parent directory. I can then use Hive to query this external table. This is very powerful. This will work unless another file in this directory with a different format is also in the directory web_logsummary.csv. (this is my case) Being able to drop in a 'symlink' where a file would go could be used to create structures from already existing data. Imagine a user that has a large hadoop deployment and is wishing to migrate/ start using hive. External table is constrained to one directory. They would need to recode application paths and or move files. If you had a 'symlink' concept anyone can start using hive without re-organizing or copying data. Right now, hive has a lot of facilities to deal with input formats, such as specifying delimiters etc, but forcing the data either into a warehouse or into an external table is limiting. 'Symlinks' tied together with hive's current input format capabilities would make hive more versatile.
          Hide
          Sanjay Radia added a comment -

          Druba wrote

          • Archives - current implementation does not follow symbolic links.
          • distcp - pure FileSystem client, does not follow symbolic links

          What is the desirable default behavior when a user does a distcp or an archive of a subtree that has symlinks to other
          volumes?
          To avoid accidental copying of large amounts of data, I suspect that the default should be to copy the symlink as a symlink
          rather than follow the symlink and copy its target.
          If this is indeed the desired default behavior then the patch will need to fix that.

          On a related note, symlinks to an archive will throw an exception/error when one tries to follow it. That would require a fix to the archive file system; we should file a separate jira to fix that in the future.

          Show
          Sanjay Radia added a comment - Druba wrote Archives - current implementation does not follow symbolic links. distcp - pure FileSystem client, does not follow symbolic links What is the desirable default behavior when a user does a distcp or an archive of a subtree that has symlinks to other volumes? To avoid accidental copying of large amounts of data, I suspect that the default should be to copy the symlink as a symlink rather than follow the symlink and copy its target. If this is indeed the desired default behavior then the patch will need to fix that. On a related note, symlinks to an archive will throw an exception/error when one tries to follow it. That would require a fix to the archive file system; we should file a separate jira to fix that in the future.
          Hide
          Nigel Daley added a comment -

          Dhruba, I'm having trouble enumerating all the test cases that are covered by the unit tests. Can you write a test plan that outlines the test cases? More than that, I'm having understanding the design of this 200Kb patch from the code and the last 2.5 months of comments on this Jira. Is there a design document for this?

          From first glance, I don't see test cases for these:

          • deleting a symlink that points to a file
          • deleting a symlink that points to a directory
          • symlink to a symlink
          • any test cases with relative links
          • delete linked to file, but not link, access link
          • delete linked to file, recreate the same file, access link
          • links to other filesystems
          • archives containing links
          • rename a file to an existing symlink
          • rename a symlink to an existing file
          • setTimes and setReplication cases should check the result on both the symlink and linked to file
          • tests involving lack of permissions to create symlinks
          • tests that try to createSymlinks with filesystems that don't support them
          • ...

          I'm also left with a number of questions:

          1. Does setReplication, setTimes, chmod, chgrp, chown apply to the symlink or the underlying file/dir that the symlink point to?
          2. Does fsck report symlinks?
          3. Does har:// protocol support symlinks in the path leading to har file? In har file itself?
          4. Why doesn't the main method for this patch, FileSystem.createSymlink(), which declares to throw IOException, no have any @throws clause in the javadoc.
          5. Under what conditions does FileSystem.createSymlink() throw IOException?
          ...and many more.

          Show
          Nigel Daley added a comment - Dhruba, I'm having trouble enumerating all the test cases that are covered by the unit tests. Can you write a test plan that outlines the test cases? More than that, I'm having understanding the design of this 200Kb patch from the code and the last 2.5 months of comments on this Jira. Is there a design document for this? From first glance, I don't see test cases for these: deleting a symlink that points to a file deleting a symlink that points to a directory symlink to a symlink any test cases with relative links delete linked to file, but not link, access link delete linked to file, recreate the same file, access link links to other filesystems archives containing links rename a file to an existing symlink rename a symlink to an existing file setTimes and setReplication cases should check the result on both the symlink and linked to file tests involving lack of permissions to create symlinks tests that try to createSymlinks with filesystems that don't support them ... I'm also left with a number of questions: 1. Does setReplication, setTimes, chmod, chgrp, chown apply to the symlink or the underlying file/dir that the symlink point to? 2. Does fsck report symlinks? 3. Does har:// protocol support symlinks in the path leading to har file? In har file itself? 4. Why doesn't the main method for this patch, FileSystem.createSymlink(), which declares to throw IOException, no have any @throws clause in the javadoc. 5. Under what conditions does FileSystem.createSymlink() throw IOException? ...and many more.
          Hide
          Konstantin Shvachko added a comment -

          > 1. can re-run the test and post them here.
          Yes, that's what I mean the results should be posted.

          > 2. every operation in DFSClient will have to convert a FsResultWritbale object into a FsResult
          No they don't because FsResultWritbale is a subclass of FsResult. So RPC call returns FsResultWritbale and DFSClient can return at further up as FsResult no conversion needed.
          Please look at the implementation of DFSClient.setReplication() above.

          > 4. Sure you can remove the method in a separate jira prior this one.

          Show
          Konstantin Shvachko added a comment - > 1. can re-run the test and post them here. Yes, that's what I mean the results should be posted. > 2. every operation in DFSClient will have to convert a FsResultWritbale object into a FsResult No they don't because FsResultWritbale is a subclass of FsResult. So RPC call returns FsResultWritbale and DFSClient can return at further up as FsResult no conversion needed. Please look at the implementation of DFSClient.setReplication() above. > 4. Sure you can remove the method in a separate jira prior this one.
          Hide
          dhruba borthakur added a comment -

          1. Performance testing: I have run DFSIOTest on 10 nodes only. I did not seen any performance degradation for any operations, including creates, deletes, reads or writes. I can re-run the test and post them here.

          2. The problem is that the FileSystem.FSLinkResolver and FileSystem.resolve() will become more complicated. Also, every operation in DFSClient will have to convert a FsResultWritbale object into a FsResult object.

          3. Fixed issues with FsResult constructor as you suggested.

          4. I would like to remove the deprecated method DFSClient.getHints as part of a separate JIRA.

          Show
          dhruba borthakur added a comment - 1. Performance testing: I have run DFSIOTest on 10 nodes only. I did not seen any performance degradation for any operations, including creates, deletes, reads or writes. I can re-run the test and post them here. 2. The problem is that the FileSystem.FSLinkResolver and FileSystem.resolve() will become more complicated. Also, every operation in DFSClient will have to convert a FsResultWritbale object into a FsResult object. 3. Fixed issues with FsResult constructor as you suggested. 4. I would like to remove the deprecated method DFSClient.getHints as part of a separate JIRA.
          Hide
          dhruba borthakur added a comment -

          Incorporate Konstantin's comments.

          Show
          dhruba borthakur added a comment - Incorporate Konstantin's comments.
          Hide
          Konstantin Shvachko added a comment -
          1. We should have some performance testing. There is a lot of wrapping/unwrapping going on. So we need to make sure that old operations like getBlockLocations(), create(), list() etc. on the FileSystem level perform the same as the old code.
          2. > I suggest we use Dhruba's implementation and do the improvement later.
            I don't understand what is the problem here.
            Nichlas proposes to have 2 classes FsResult<T> and FsResultWritable<T> which extends FsResult<T>.
            This is exactly what should be done here. ClientProtocol methods should return FsResultWritable<T>, while DFSClient and all other *FileSystem classes should return FsResult<T>. This is because Writable is needed only in the RPC level, after that Writable is not required. E.g.
            interface ClientProtocol {
              FsResultWritable<LocatedBlocks>  setReplication(...);
              FsResultWritable<NullWritable> create(...);
            }
            
            class DFSClient {
              FsResult<LocatedBlocks>  setReplication(...) {
                return namenode.setReplication(...); // can do this because FsResultWritable is a subclass of FsResult
              }
            
              FsResult<OutputStream> create(...) {
                return new FsResult<OutputStream>(new DFSOutputStream(...));
              }
            }
            
          3. I did not review the code. Just looked in few places.
          • Looks like FsResult constructors do not need to throw any exceptions.
          • DFSClient.getHints() is deprecated. It is a good time to remove it.
          Show
          Konstantin Shvachko added a comment - We should have some performance testing . There is a lot of wrapping/unwrapping going on. So we need to make sure that old operations like getBlockLocations(), create(), list() etc. on the FileSystem level perform the same as the old code. > I suggest we use Dhruba's implementation and do the improvement later. I don't understand what is the problem here. Nichlas proposes to have 2 classes FsResult<T> and FsResultWritable<T> which extends FsResult<T> . This is exactly what should be done here. ClientProtocol methods should return FsResultWritable<T> , while DFSClient and all other *FileSystem classes should return FsResult<T> . This is because Writable is needed only in the RPC level, after that Writable is not required. E.g. interface ClientProtocol { FsResultWritable<LocatedBlocks> setReplication(...); FsResultWritable<NullWritable> create(...); } class DFSClient { FsResult<LocatedBlocks> setReplication(...) { return namenode.setReplication(...); // can do this because FsResultWritable is a subclass of FsResult } FsResult<OutputStream> create(...) { return new FsResult<OutputStream>( new DFSOutputStream(...)); } } I did not review the code. Just looked in few places. Looks like FsResult constructors do not need to throw any exceptions. DFSClient.getHints() is deprecated. It is a good time to remove it.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Talked to Dhruba offline. He mentioned that the idea my previous comment is not easy to implement. I did not have much time to take a closer look. So, I suggest we use Dhruba's implementation and do the improvement later.

          Show
          Tsz Wo Nicholas Sze added a comment - Talked to Dhruba offline. He mentioned that the idea my previous comment is not easy to implement. I did not have much time to take a closer look. So, I suggest we use Dhruba's implementation and do the improvement later.
          Hide
          dhruba borthakur added a comment -

          Fixed findbugs warnings. Also fixed the reason for the unit test failure: FileSystem.exists() was not implemented correctly in the earlier version of this patch.

          Show
          dhruba borthakur added a comment - Fixed findbugs warnings. Also fixed the reason for the unit test failure: FileSystem.exists() was not implemented correctly in the earlier version of this patch.
          Hide
          dhruba borthakur added a comment -

          Investigate failed unit test and findbugs problems.

          Show
          dhruba borthakur added a comment - Investigate failed unit test and findbugs problems.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12393857/symLink12.patch
          against trunk revision 713847.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 62 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12393857/symLink12.patch against trunk revision 713847. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 62 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          Incorporated Sanjay's review comments. The FileSystem.createSymLink() does take two parameters, but only the first one needs to be "resolved". The second one can be anything (just a blob) and does not need to be resolved. In fact, the second parameter could be a dangling non-existent path.

          Included test cases for all FileSystem APIs that were modified.

          From Rob''s list of tests

          • Shell commands – unit test included
          • Trash – unit test included (because Trash is a pure FileSystem client)
          • Archives – current implementation does not follow symbolic links.
          • distcp – pure FileSystem client, does not follow symbolic links
          • chmod, chown, chgrp APIs – unit test included
          • Rename - unit test included
          Show
          dhruba borthakur added a comment - Incorporated Sanjay's review comments. The FileSystem.createSymLink() does take two parameters, but only the first one needs to be "resolved". The second one can be anything (just a blob) and does not need to be resolved. In fact, the second parameter could be a dangling non-existent path. Included test cases for all FileSystem APIs that were modified. From Rob''s list of tests Shell commands – unit test included Trash – unit test included (because Trash is a pure FileSystem client) Archives – current implementation does not follow symbolic links. distcp – pure FileSystem client, does not follow symbolic links chmod, chown, chgrp APIs – unit test included Rename - unit test included
          Hide
          dhruba borthakur added a comment -

          Thanks Sanjay and Robert. Will work your feedback and post a new patch.

          Show
          dhruba borthakur added a comment - Thanks Sanjay and Robert. Will work your feedback and post a new patch.
          Hide
          Robert Chansler added a comment -

          Just a list of important test cases saved from a hallway conversation:

          • Shell commands
          • Trash
          • Archives
          • distcp
          • chmod, chown, chgrp APIs
          • Rename
          Show
          Robert Chansler added a comment - Just a list of important test cases saved from a hallway conversation: Shell commands Trash Archives distcp chmod, chown, chgrp APIs Rename
          Hide
          Sanjay Radia added a comment -

          Went though the patch. Looks good.
          Feedback:
          1) Why does createSymLink create the group name as "none" rather than pass a null parameter to use the
          group name from the parent dir similar to the create and mkdir operations.

          2) rename and createSymlink both take two path parameters - either of the paths can traverse a symlink.
          createSymlink does not handle this while rename does.
          An alternate way to handle this (compared to how rename does) is to have the FsResult return a flag saying which of the two paths cross a symlink.

          3) getFileInode throws UnresolvedPathException. However, LeaseManager:findPath() was changed to
          throw IOException (it threw no exceptions before). findPath should be declared to throw only the UnresolvedPathException.

          4) There is a test to check that symlinks are correctly persisted in fsimage/editslogs.
          Please add TestSymlinks that tests for symlinks for each of FileSystem methods that take one or more path names. For those that take multiple path names, the test should test for symlinks in either parameter.

          Show
          Sanjay Radia added a comment - Went though the patch. Looks good. Feedback: 1) Why does createSymLink create the group name as "none" rather than pass a null parameter to use the group name from the parent dir similar to the create and mkdir operations. 2) rename and createSymlink both take two path parameters - either of the paths can traverse a symlink. createSymlink does not handle this while rename does. An alternate way to handle this (compared to how rename does) is to have the FsResult return a flag saying which of the two paths cross a symlink. 3) getFileInode throws UnresolvedPathException. However, LeaseManager:findPath() was changed to throw IOException (it threw no exceptions before). findPath should be declared to throw only the UnresolvedPathException. 4) There is a test to check that symlinks are correctly persisted in fsimage/editslogs. Please add TestSymlinks that tests for symlinks for each of FileSystem methods that take one or more path names. For those that take multiple path names, the test should test for symlinks in either parameter.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          FsResult can be divided into two classes since there are two kind of results, Writables (e.g. FsResult<BooleanWritable>) and non-writables (e.g. FSDataOutputStreamLink). I suggest the following layout:

          public class FsResult<T> implements FSLinkable {
            ...
          }
          
          public class FsResultWritable<T extends Writable> extends FsResult<T> implements Writable {
            ...
          }
          

          Methods like appendImpl(...) should return FsResult<FSDataOutputStream> and methods like setReplication(...) should return FsResultWritable<BooleanWritable>.

          Then, we don't need classes like FSDataInputStreamLink and FSDataOutputStreamLink. We also probably don't need FSLinkable anymore.

          4044_20081030spi.java shows the complete source codes for FsResult and FsResultWritable. I have not removed FSLinkable yet.

          Show
          Tsz Wo Nicholas Sze added a comment - FsResult can be divided into two classes since there are two kind of results, Writables (e.g. FsResult<BooleanWritable>) and non-writables (e.g. FSDataOutputStreamLink). I suggest the following layout: public class FsResult<T> implements FSLinkable { ... } public class FsResultWritable<T extends Writable> extends FsResult<T> implements Writable { ... } Methods like appendImpl(...) should return FsResult<FSDataOutputStream> and methods like setReplication(...) should return FsResultWritable<BooleanWritable>. Then, we don't need classes like FSDataInputStreamLink and FSDataOutputStreamLink. We also probably don't need FSLinkable anymore. 4044_20081030spi.java shows the complete source codes for FsResult and FsResultWritable. I have not removed FSLinkable yet.
          Hide
          Doug Cutting added a comment -

          The src/core parts of this patch look good to me.

          One nit: since maxPathLinks is not settable, it should be a private constant MAX_PATH_LINKS for now. And 1000 seems way too big. Linux limited links to 5 for a long time, but Posix specifies a minimum of 8. So we should probably choose somewhere between 8 and 32.

          If folks find this to be a problem, we could make the limit configurable, or we could simply detect loops:

            T resolve(FileSystem fs, Path p) throws IOException {
              T in = next(fs, p);
              Path first = p;
              HashSet seen = new HashSet();
                    
              // continue looping till there are no more symbolic links in the path.
              while (in.isLink()) {
                seen.add(p);
          
                // construct new path
                p = in.getLink();
          
                if (seen.contains(p))
                  throw new IOException("Loop in symbolic links " + first);
                }
                ...
          }
          
          Show
          Doug Cutting added a comment - The src/core parts of this patch look good to me. One nit: since maxPathLinks is not settable, it should be a private constant MAX_PATH_LINKS for now. And 1000 seems way too big. Linux limited links to 5 for a long time, but Posix specifies a minimum of 8. So we should probably choose somewhere between 8 and 32. If folks find this to be a problem, we could make the limit configurable, or we could simply detect loops: T resolve(FileSystem fs, Path p) throws IOException { T in = next(fs, p); Path first = p; HashSet seen = new HashSet(); // continue looping till there are no more symbolic links in the path. while (in.isLink()) { seen.add(p); // construct new path p = in.getLink(); if (seen.contains(p)) throw new IOException( "Loop in symbolic links " + first); } ... }
          Hide
          dhruba borthakur added a comment -

          The Client Protocol uses types of the form FsResult<BooleanWritable>, FsResult<LocatedBlocks>, etc. This patch is ready for review.

          Show
          dhruba borthakur added a comment - The Client Protocol uses types of the form FsResult<BooleanWritable>, FsResult<LocatedBlocks>, etc. This patch is ready for review.
          Hide
          Sanjay Radia added a comment -

          Doug says> Sanjay has argued that it is abnormal for a filesystem to link to a different filesystem. This misses the intended sense of "normal". The value of a link is normal data, under the control of the filesystem implementation, regardless of whether it points to a different filesystem.

          Don't want to restart the debate, but need to correct something that was attributed to me:
          I never said that symlinks to remote files were not normal. Processing them (ie following them to a different NN) is not normal for a NN, Some NNs may follow through and some may not. Indeed DNS does both (follows or redirects the caller).

          Show
          Sanjay Radia added a comment - Doug says> Sanjay has argued that it is abnormal for a filesystem to link to a different filesystem. This misses the intended sense of "normal". The value of a link is normal data, under the control of the filesystem implementation, regardless of whether it points to a different filesystem. Don't want to restart the debate, but need to correct something that was attributed to me: I never said that symlinks to remote files were not normal. Processing them (ie following them to a different NN) is not normal for a NN, Some NNs may follow through and some may not. Indeed DNS does both (follows or redirects the caller).
          Hide
          Sanjay Radia added a comment -

          > I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed.

          >+1. This will affect not only NNBench but all benchmarks including DFSIO and especially NNThroughputBenchmark.
          >GridMix and Sort will probably be less affected, but will suffer too.

          +1. I would also like to avoid an extra rpc, since avoiding one is straight forward.

          Doug >What did you think about my suggestion above that we might use a cache to avoid this? First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations.

          Clearly your cache solution deals with the extra RPC issue.
          Generally I see a cache as a way of improving the performance of an ordinarily good design or algorithm. I don't like the use of caches as part of a design to make an algorithm work when alternate good designs are available that don't need a cache. Would we have come up with this design if we didn't have such an emotionally charged discussion on exceptions?

          We have a good design where if the resolution fails due to a symlink, we return this information to the caller. It does not require the use of a cache.
          We are divided over how to return this information - use the return status or use an exception.
          The cache solution is a way to avoid making the painfully emotionally charged decision for the Hadoop community.
          I don't want to explain the reason we use the cache to hadoop developers again and again down the road.
          We should not avoid the decision, but make it.
          A couple of weeks ago I was confident that a compromise vote would pass. I am hoping that the same is true now.

          Show
          Sanjay Radia added a comment - > I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed. >+1. This will affect not only NNBench but all benchmarks including DFSIO and especially NNThroughputBenchmark. >GridMix and Sort will probably be less affected, but will suffer too. +1. I would also like to avoid an extra rpc, since avoiding one is straight forward. Doug >What did you think about my suggestion above that we might use a cache to avoid this? First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations. Clearly your cache solution deals with the extra RPC issue. Generally I see a cache as a way of improving the performance of an ordinarily good design or algorithm. I don't like the use of caches as part of a design to make an algorithm work when alternate good designs are available that don't need a cache. Would we have come up with this design if we didn't have such an emotionally charged discussion on exceptions? We have a good design where if the resolution fails due to a symlink, we return this information to the caller. It does not require the use of a cache. We are divided over how to return this information - use the return status or use an exception. The cache solution is a way to avoid making the painfully emotionally charged decision for the Hadoop community. I don't want to explain the reason we use the cache to hadoop developers again and again down the road. We should not avoid the decision, but make it. A couple of weeks ago I was confident that a compromise vote would pass. I am hoping that the same is true now.
          Hide
          Konstantin Shvachko added a comment -

          > I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed.

          +1. This will affect not only NNBench but all benchmarks including DFSIO and especially NNThroughputBenchmark.
          GridMix and Sort will probably be less affected, but will suffer too.

          Show
          Konstantin Shvachko added a comment - > I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed. +1. This will affect not only NNBench but all benchmarks including DFSIO and especially NNThroughputBenchmark. GridMix and Sort will probably be less affected, but will suffer too.
          Hide
          dhruba borthakur added a comment -

          I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed. I will post a new patch that incorporates Nicholas's idea of FsResult<T> style return type classes for SPI methods. I am hoping that this will be acceptable to all.

          Show
          dhruba borthakur added a comment - I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed. I will post a new patch that incorporates Nicholas's idea of FsResult<T> style return type classes for SPI methods. I am hoping that this will be acceptable to all.
          Hide
          Raghu Angadi added a comment -

          > The same thing that happens if that change is made just after a file is opened. If you open a file, then someone else deletes it, subsequent accesses to that file will fail.

          yes, thats an HDFS implementation problems which could be fixed. That is qualitatively different from similar issues 'forced' by FS layer/API (which could also be fixed in future).

          I don't have strong preference either way regd implementation.

          With extra RPC some of the negatives I see (nothing too urgent):

          • we won't see many negative affects since there are very few loads that are gated by NameNode currently. It might change
          • Every couple of months we might need to answer "why is this so..?" question on the list
          • In my experience any extra work done that could be avoided will end up being a problem in future. Especially large distribute systems. So sooner or later we will end up 'correct' it.
          Show
          Raghu Angadi added a comment - > The same thing that happens if that change is made just after a file is opened. If you open a file, then someone else deletes it, subsequent accesses to that file will fail. yes, thats an HDFS implementation problems which could be fixed. That is qualitatively different from similar issues 'forced' by FS layer/API (which could also be fixed in future). I don't have strong preference either way regd implementation. With extra RPC some of the negatives I see (nothing too urgent): we won't see many negative affects since there are very few loads that are gated by NameNode currently. It might change Every couple of months we might need to answer "why is this so..?" question on the list In my experience any extra work done that could be avoided will end up being a problem in future. Especially large distribute systems. So sooner or later we will end up 'correct' it.
          Hide
          Doug Cutting added a comment -

          > I am almost certain that it won't affect any benchmark other than NNBench.

          If that's really the case, what are we worried about here?

          > What happens if a link or directory is changed between these two operations? open() fails though it should not.

          The same thing that happens if that change is made just after a file is opened. If you open a file, then someone else deletes it, subsequent accesses to that file will fail. The namenode doesn't keep any state for files open for read, so a short-lived cache of block locations doesn't change things fundamentally.

          That said, the cache idea only works for open, and doesn't work for rename, delete, etc. In these cases we don't want to pre-fetch a list of block locations. So nevermind the cache idea anyway.

          The current options on the table seem to be:

          • Dhruba's patch modified to use Nicholas's idea of LinkResult<T> style, to avoid defining new return type classes for the SPI methods.
          • A less-invasive approach that requires two RPCs. We may later optimize this by converting FileSystem's API to use the above style, but we may not need to. We do need to be careful not to incompatibly change FileSystem's public API, but the SPI's not so constrained, since all FileSystem implementations are currently in trunk and can be easily maintained in a coordinated manner. In the meantime, we can start using symbolic links in archives, etc. while we work out if and how to better optimize them.

          Does that sound right?

          I don't have a strong preference. If I were implementing it myself I'd probably go for the simple approach first, early in a release cycle, then benchmark things and optimize it subsequently if needed. The risk is not that great, since we already have good ideas of how to optimize it. But the optimization will clearly help scalability, so it wouldn't hurt to have it from the outset either.

          FYI, I tried implementing my patch above as a LinkedFileSystem subclass, to better contain the changes. This turned out to be messy, since a LinkedFileSystem can link to an unlinked FileSystem. With the subclass approach this must be explicitly handled by casts and 'instanceof', while when FileSystem itself supports links this can be handled by default method implementations. So I am not convinced that a LinkedFileSystem subclass is a net win.

          Show
          Doug Cutting added a comment - > I am almost certain that it won't affect any benchmark other than NNBench. If that's really the case, what are we worried about here? > What happens if a link or directory is changed between these two operations? open() fails though it should not. The same thing that happens if that change is made just after a file is opened. If you open a file, then someone else deletes it, subsequent accesses to that file will fail. The namenode doesn't keep any state for files open for read, so a short-lived cache of block locations doesn't change things fundamentally. That said, the cache idea only works for open, and doesn't work for rename, delete, etc. In these cases we don't want to pre-fetch a list of block locations. So nevermind the cache idea anyway. The current options on the table seem to be: Dhruba's patch modified to use Nicholas's idea of LinkResult<T> style, to avoid defining new return type classes for the SPI methods. A less-invasive approach that requires two RPCs. We may later optimize this by converting FileSystem's API to use the above style, but we may not need to. We do need to be careful not to incompatibly change FileSystem's public API, but the SPI's not so constrained, since all FileSystem implementations are currently in trunk and can be easily maintained in a coordinated manner. In the meantime, we can start using symbolic links in archives, etc. while we work out if and how to better optimize them. Does that sound right? I don't have a strong preference. If I were implementing it myself I'd probably go for the simple approach first, early in a release cycle, then benchmark things and optimize it subsequently if needed. The risk is not that great, since we already have good ideas of how to optimize it. But the optimization will clearly help scalability, so it wouldn't hurt to have it from the outset either. FYI, I tried implementing my patch above as a LinkedFileSystem subclass, to better contain the changes. This turned out to be messy, since a LinkedFileSystem can link to an unlinked FileSystem. With the subclass approach this must be explicitly handled by casts and 'instanceof', while when FileSystem itself supports links this can be handled by default method implementations. So I am not convinced that a LinkedFileSystem subclass is a net win.
          Hide
          Raghu Angadi added a comment -

          One of the issues with 'resolve then open' approach is correctness. What happens if a link or directory is changed between these two operations? open() fails though it should not.

          Show
          Raghu Angadi added a comment - One of the issues with 'resolve then open' approach is correctness. What happens if a link or directory is changed between these two operations? open() fails though it should not.
          Hide
          Raghu Angadi added a comment -

          >First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations.

          I am almost certain that it won't affect any benchmark other than NNBench.

          Show
          Raghu Angadi added a comment - >First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations. I am almost certain that it won't affect any benchmark other than NNBench.
          Hide
          Doug Cutting added a comment -

          Sanjay> I really don't like the extra rpc [ ... ]

          What did you think about my suggestion above that we might use a cache to avoid this? First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations.

          Show
          Doug Cutting added a comment - Sanjay> I really don't like the extra rpc [ ... ] What did you think about my suggestion above that we might use a cache to avoid this? First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations.
          Hide
          Sanjay Radia added a comment -

          Owen> I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult.

          Doug> I see. Yuck.

          I second the Yuck.

          Doug> Perhaps we're prematurely optimizing here. Perhaps the extra RPCs to resolve paths won't hurt things that much.

          The above was in reference to Dhruba's approach of using the FSLinkResolver abstract class verses Owen's more inline approach.
          While Dhruba's FSLinkResolver is a little more complex, it is not complex enough to warrant doing the additional RPC.
          We might be able to further simplify it by using a LinkableFileSystem base class as Doug suggested.

          I really don't like the extra rpc for any method that has a pathname as a parameter regardless of whether or not a link is in the path.

          Show
          Sanjay Radia added a comment - Owen> I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult. Doug> I see. Yuck. I second the Yuck. Doug> Perhaps we're prematurely optimizing here. Perhaps the extra RPCs to resolve paths won't hurt things that much. The above was in reference to Dhruba's approach of using the FSLinkResolver abstract class verses Owen's more inline approach. While Dhruba's FSLinkResolver is a little more complex, it is not complex enough to warrant doing the additional RPC. We might be able to further simplify it by using a LinkableFileSystem base class as Doug suggested. I really don't like the extra rpc for any method that has a pathname as a parameter regardless of whether or not a link is in the path.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Sanjay> if someone happens to create a dir called "hdfs:" and ...

          Currently, ":" is probably not supported in path. See HADOOP-2066.

          Show
          Tsz Wo Nicholas Sze added a comment - Sanjay> if someone happens to create a dir called "hdfs:" and ... Currently, ":" is probably not supported in path. See HADOOP-2066 .
          Hide
          Doug Cutting added a comment -

          Sanjay> if someone happens to create a dir called "hdfs:" and valid path foo/bar under it then the local file system will follow it and not return an error. Is this a concern?

          Folks would have to use a double-slash in the symbolic link (which unix ignores) for Hadoop to interpret it as an external link. So folks would have to use single-slash when linking to a directory whose name ends in colon to keep behavior consistent. That doesn't seem like a big imposition.

          Sanjay> Hence, on a "file does not exist" error, the client side lib will have to check if there are symlinks in the path.

          Good point. If we leverage Unix's automatic link following then we'd have to handle external links by catching FileNotFoundException. We could instead loop calling lstat(), following links that are internal, and returning when either a link to a non file:// uri or an actual file is found. This would require native code, but, this is a unix-only feature and we already provide native code for linux, so maybe that's acceptable. Or we could call out to a shell, but that would probably be too slow. Note that Java 7 adds API support for symbolic links, which will permit such looping logic, so that will provide a good way to implement this.

          FileNotFoundException doesn't actually make this any easier, since it doesn't contain the path we need, as far as I can tell. We'd still have to loop in native code or a shell script to find the external link, no? So maybe the native code route is best? We could implement just this bit of the Java 7 API, so when that arrives we can use it without changing much?

          Show
          Doug Cutting added a comment - Sanjay> if someone happens to create a dir called "hdfs:" and valid path foo/bar under it then the local file system will follow it and not return an error. Is this a concern? Folks would have to use a double-slash in the symbolic link (which unix ignores) for Hadoop to interpret it as an external link. So folks would have to use single-slash when linking to a directory whose name ends in colon to keep behavior consistent. That doesn't seem like a big imposition. Sanjay> Hence, on a "file does not exist" error, the client side lib will have to check if there are symlinks in the path. Good point. If we leverage Unix's automatic link following then we'd have to handle external links by catching FileNotFoundException. We could instead loop calling lstat(), following links that are internal, and returning when either a link to a non file:// uri or an actual file is found. This would require native code, but, this is a unix-only feature and we already provide native code for linux, so maybe that's acceptable. Or we could call out to a shell, but that would probably be too slow. Note that Java 7 adds API support for symbolic links, which will permit such looping logic, so that will provide a good way to implement this. FileNotFoundException doesn't actually make this any easier, since it doesn't contain the path we need, as far as I can tell. We'd still have to loop in native code or a shell script to find the external link, no? So maybe the native code route is best? We could implement just this bit of the Java 7 API, so when that arrives we can use it without changing much?
          Hide
          Allen Wittenauer added a comment -

          >(I guess unix thinks the : is simply part of the name.)

          Pretty much. ln doesn't validate the link at all, so you can put all sorts of interesting things there. It is up to the calling program or filesystem to verify the validity in its context. [For example, I think it is rsh that if you link a hostname to rsh, it will then rsh to that host.]

          It should also be noted that : is a valid directory separator under HFS+ (Mac OS X) as well. You're likely to see very bizarro results on those operating systems that use : as a metachar as a result.

          Show
          Allen Wittenauer added a comment - >(I guess unix thinks the : is simply part of the name.) Pretty much. ln doesn't validate the link at all, so you can put all sorts of interesting things there. It is up to the calling program or filesystem to verify the validity in its context. [For example, I think it is rsh that if you link a hostname to rsh, it will then rsh to that host.] It should also be noted that : is a valid directory separator under HFS+ (Mac OS X) as well. You're likely to see very bizarro results on those operating systems that use : as a metachar as a result.
          Hide
          Sanjay Radia added a comment -

          Doug Says: >Internal symlinks are supported automatically by local FS, at least on unix. And support for remote symlinks on the local FS might be very cool. Unix will gladly let me create symbolics link to anHDFS URI (Try 'ln -s hdfs://foo/bar foo; ls -l foo'). These cannot be resolved by a unix shell, but they could by 'bin/hadoop fs -ls', by MapReduce programs etc. One could, e.g., keep in one's home directory on unix links to the HDFS directories that one commonly uses ..

          It would be very cool to support them. (I guess unix thinks the : is simply part of the name.)
          BTW if someone happens to create a dir called "hdfs:" and valid path foo/bar under it then the local file system will follow it and not return an error. Is this a concern?

          When trying to go through the remote mount symlink, the local file system reports that the file does not exist. Hence, on a "file does not exist" error, the client side lib will have to check if there are symlinks in the path. The only snag is that "file not found error"s will take a little longer to report as the
          library ties to check if there is a symlink in the path. Is this an issue? Most apps consider "file not found" to be an error and hence the speed of reporting that is not a concern; however, there are some apps that create files if they are missing and their performance will suffer a little.

          Show
          Sanjay Radia added a comment - Doug Says: >Internal symlinks are supported automatically by local FS, at least on unix. And support for remote symlinks on the local FS might be very cool. Unix will gladly let me create symbolics link to anHDFS URI (Try 'ln -s hdfs://foo/bar foo; ls -l foo'). These cannot be resolved by a unix shell, but they could by 'bin/hadoop fs -ls', by MapReduce programs etc. One could, e.g., keep in one's home directory on unix links to the HDFS directories that one commonly uses .. It would be very cool to support them. (I guess unix thinks the : is simply part of the name.) BTW if someone happens to create a dir called "hdfs:" and valid path foo/bar under it then the local file system will follow it and not return an error. Is this a concern? When trying to go through the remote mount symlink, the local file system reports that the file does not exist. Hence, on a "file does not exist" error, the client side lib will have to check if there are symlinks in the path. The only snag is that "file not found error"s will take a little longer to report as the library ties to check if there is a symlink in the path. Is this an issue? Most apps consider "file not found" to be an error and hence the speed of reporting that is not a concern; however, there are some apps that create files if they are missing and their performance will suffer a little.
          Hide
          Doug Cutting added a comment -

          > I see you are changing the subject.

          I am saddened that you think that. I have tried to say the same thing in many different ways, to make myself better understood. I will say it one last time: I will veto any patch that uses exceptions to to communicate links within the FileSystem implentation. Exceptions are an inappropriate mechanism for communicating normal, expected values. In the namesystem sub-component of HDFS, which stores links, and any other sub-components between that and the generic FileSystem API, links are normal, expected events, even links to different FileSystems. There. I've said it again. I am now done saying it. Do not expect me to respond again about the issue of using exceptions to communicate the presence of links in HDFS. My answer will not change. Let's move on.

          Show
          Doug Cutting added a comment - > I see you are changing the subject. I am saddened that you think that. I have tried to say the same thing in many different ways, to make myself better understood. I will say it one last time: I will veto any patch that uses exceptions to to communicate links within the FileSystem implentation. Exceptions are an inappropriate mechanism for communicating normal, expected values. In the namesystem sub-component of HDFS, which stores links, and any other sub-components between that and the generic FileSystem API, links are normal, expected events, even links to different FileSystems. There. I've said it again. I am now done saying it. Do not expect me to respond again about the issue of using exceptions to communicate the presence of links in HDFS. My answer will not change. Let's move on.
          Hide
          Konstantin Shvachko added a comment -

          Doug> Does that address your question?

          Yes it does in the way that I see you are changing the subject.
          It was about exception usage, now it is about monolithic layers of HDFS.
          Since the programming style correctness is not measurable as opposed to performance there is no definite answer on what is correct/incorrect, and therefore the discussion clearly can go on forever. If one argument is exhausted another can be raised or linked to the previous, and so on.

          Although the question you raise is interesting by itself.
          HDFS is not a single monolithic layer the same way as Hadoop is not a single layer. MapReduce as a layer on top of HDFS but we do not impose the knowledge of splits upon the file system. Going further up the stack we also do not require HDFS to recognize HBase records or files corresponding to columns in a table store.

          HDFS layers are clearly different abstractions. E.g., NameNode does not know about input and output streams. Sure, the layers have common structures, say FileStatus is visible to all layers, but ContentSummary is constructed on the top FileSystem level.

          I don't think this argument works in favor of your approach and justifies passing link values through the whole hierarchy of HDFS layers.

          Show
          Konstantin Shvachko added a comment - Doug> Does that address your question? Yes it does in the way that I see you are changing the subject. It was about exception usage, now it is about monolithic layers of HDFS. Since the programming style correctness is not measurable as opposed to performance there is no definite answer on what is correct/incorrect, and therefore the discussion clearly can go on forever. If one argument is exhausted another can be raised or linked to the previous, and so on. Although the question you raise is interesting by itself. HDFS is not a single monolithic layer the same way as Hadoop is not a single layer. MapReduce as a layer on top of HDFS but we do not impose the knowledge of splits upon the file system. Going further up the stack we also do not require HDFS to recognize HBase records or files corresponding to columns in a table store. HDFS layers are clearly different abstractions. E.g., NameNode does not know about input and output streams. Sure, the layers have common structures, say FileStatus is visible to all layers, but ContentSummary is constructed on the top FileSystem level. I don't think this argument works in favor of your approach and justifies passing link values through the whole hierarchy of HDFS layers.
          Hide
          Doug Cutting added a comment -

          > I guess we need an openImpl(...) method. Otherwise, it would be an infinite loop.

          No, they have different signatures. I've attached a patch for FileSystem that implements this. For FileSystem's that do not support links this is completely back-compatible. HDFS could implement this by adding an RPC per file before it is opened. If that proves too slow, it could perhaps optimize things by returning and caching the block list with the link information, so that a second RPC could be avoided.

          The patch is not complete, but it works. I've only implemented open. It's meant to be an illustration of a style that we might use.

          Show
          Doug Cutting added a comment - > I guess we need an openImpl(...) method. Otherwise, it would be an infinite loop. No, they have different signatures. I've attached a patch for FileSystem that implements this. For FileSystem's that do not support links this is completely back-compatible. HDFS could implement this by adding an RPC per file before it is opened. If that proves too slow, it could perhaps optimize things by returning and caching the block list with the link information, so that a second RPC could be avoided. The patch is not complete, but it works. I've only implemented open. It's meant to be an illustration of a style that we might use.
          Hide
          Doug Cutting added a comment -

          Konstantin> checked exceptions can be used when a particular layer does not have an immediate response to the condition causing the exception

          First, in this case, I consider HDFS to be a single layer, or module. HDFS controls all of its components. The calls on the stack above that are still part of HDFS are known to HDFS, are under HDFS's control and are modified in a coordinated manner, as a unit. In particular, all of HDFS knows that a link can be to an external FileSystem, since this is a generic property of all FileSystems. This is a safe assumption for any code in HDFS to make, and does not violate any abstraction boundaries within HDFS.

          An exception is justified when the best handler for it is unknown. It is not the immediacy of the handler, or the proximity of the handler, but whether the handler is fully determined. If the handler is known, then exceptions are just being used as a convenient message passing channel. When the destination is known, methods, parameters and return values are the preferred message passing mechanisms in object-oriented programming.

          Does that address your question?

          Show
          Doug Cutting added a comment - Konstantin> checked exceptions can be used when a particular layer does not have an immediate response to the condition causing the exception First, in this case, I consider HDFS to be a single layer, or module. HDFS controls all of its components. The calls on the stack above that are still part of HDFS are known to HDFS, are under HDFS's control and are modified in a coordinated manner, as a unit. In particular, all of HDFS knows that a link can be to an external FileSystem, since this is a generic property of all FileSystems. This is a safe assumption for any code in HDFS to make, and does not violate any abstraction boundaries within HDFS. An exception is justified when the best handler for it is unknown. It is not the immediacy of the handler, or the proximity of the handler, but whether the handler is fully determined. If the handler is known, then exceptions are just being used as a convenient message passing channel. When the destination is known, methods, parameters and return values are the preferred message passing mechanisms in object-oriented programming. Does that address your question?
          Hide
          Konstantin Shvachko added a comment -

          I am replying to this Doug's comment HADOOP-4044#action_12638405.
          Was thinking about it all night and have to admit I am not quite sure how to understand it.

          We were talking about the mount point exception handling on different layers of the implementation of HDFS.
          Looks like that based on the references you provide checked exceptions can be used when a particular layer does not have an immediate response to the condition causing the exception.

          If your comment means that you can tolerate mount point exception on all levels of HDFS implementation up to (excluding) the FileSystem api, then I anticipate a big sigh of relief from a large group of developers. And we can turn to discussion of the FileSystem api return types, which Owen is already successfully conducting.

          Otherwise, my understanding of the comment is that you are introducing a new argument (line of defense) for your dual-return-type approach, and that means that we are back at the beginning and will have to start over the whole discussion again.
          Please clarify.

          Show
          Konstantin Shvachko added a comment - I am replying to this Doug's comment HADOOP-4044 #action_12638405. Was thinking about it all night and have to admit I am not quite sure how to understand it. We were talking about the mount point exception handling on different layers of the implementation of HDFS. Looks like that based on the references you provide checked exceptions can be used when a particular layer does not have an immediate response to the condition causing the exception. If your comment means that you can tolerate mount point exception on all levels of HDFS implementation up to (excluding) the FileSystem api, then I anticipate a big sigh of relief from a large group of developers. And we can turn to discussion of the FileSystem api return types, which Owen is already successfully conducting. Otherwise, my understanding of the comment is that you are introducing a new argument (line of defense) for your dual-return-type approach, and that means that we are back at the beginning and will have to start over the whole discussion again. Please clarify.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > That last method should instead read ...

          I guess we need an openImpl(...) method. Otherwise, it would be an infinite loop.

          Show
          Tsz Wo Nicholas Sze added a comment - > That last method should instead read ... I guess we need an openImpl(...) method. Otherwise, it would be an infinite loop.
          Hide
          Doug Cutting added a comment -

          Oops. That last method should instead read:

          public FSInputStream open(Path p, ...) throws IOException {
            ResolvedPath resolved = getResolvedPath(p);
            return p.getFileSystem(conf).open(p);
          }
          
          Show
          Doug Cutting added a comment - Oops. That last method should instead read: public FSInputStream open(Path p, ...) throws IOException { ResolvedPath resolved = getResolvedPath(p); return p.getFileSystem(conf).open(p); }
          Hide
          Doug Cutting added a comment -

          Perhaps we're prematurely optimizing here. Perhaps the extra RPCs to resolve paths won't hurt things that much. The namenode benchmarks I've seen show that it handles 50k opens/second. How much slower would that be if we always first resolved the path. In this naive approach we'd add a method which must always be called before a file is opened (or renamed, etc.) that would return an 'isResolved' boolean and a path. If the path is resolved then it can be used, otherwise we must resolve it further. The resolution loop could then be independent of the FileSystem method, so no ThreadLocal or other mechanism would be required to maintain this state.

          public class ResolvedPath extends Path {}
          
          public class LinkResolution {
            public isResolved() { ...}
            public Path getUnresolvedPath();
            public ResolvedPath getResolvedPath();
          }
          
          public class FileSystem {
            ...
            public abstract LinkResolution resolveLinks(Path p);
            public abstract FSInputStream open(ResolvedPath p);
          
            private ResolvedPath getResolvedPath(Path p) throws IOException {
              LinkResolution resolution = resolveLinks(p);
              int count = 0;
              while (!resolution.isResolved()) && count < MAX_LINKS) {
                p = resolution.getUnresolvedPath();
                resolution p.getFileSystem(conf).resolveLinks(p);
              }
              return resolution.getResolvedPath();
            }
          
            public FSInputStream open(Path p, ...) throws IOException {
              return open(getResolvedPath(p), ...);
            }
           ...
          }
          

          No exceptions, no thread locals, no anonymous classes, etc. Should we benchmark this before we rule it out?

          Show
          Doug Cutting added a comment - Perhaps we're prematurely optimizing here. Perhaps the extra RPCs to resolve paths won't hurt things that much. The namenode benchmarks I've seen show that it handles 50k opens/second. How much slower would that be if we always first resolved the path. In this naive approach we'd add a method which must always be called before a file is opened (or renamed, etc.) that would return an 'isResolved' boolean and a path. If the path is resolved then it can be used, otherwise we must resolve it further. The resolution loop could then be independent of the FileSystem method, so no ThreadLocal or other mechanism would be required to maintain this state. public class ResolvedPath extends Path {} public class LinkResolution { public isResolved() { ...} public Path getUnresolvedPath(); public ResolvedPath getResolvedPath(); } public class FileSystem { ... public abstract LinkResolution resolveLinks(Path p); public abstract FSInputStream open(ResolvedPath p); private ResolvedPath getResolvedPath(Path p) throws IOException { LinkResolution resolution = resolveLinks(p); int count = 0; while (!resolution.isResolved()) && count < MAX_LINKS) { p = resolution.getUnresolvedPath(); resolution p.getFileSystem(conf).resolveLinks(p); } return resolution.getResolvedPath(); } public FSInputStream open(Path p, ...) throws IOException { return open(getResolvedPath(p), ...); } ... } No exceptions, no thread locals, no anonymous classes, etc. Should we benchmark this before we rule it out?
          Hide
          Doug Cutting added a comment -

          Owen> I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult.

          I see. Yuck. It wouldn't in fact be entirely encapsulated, since it would assume that all the resolution of a path happens in the same thread. Thus we'd want to mark each of the FileSystem methods that uses a LinkResult final, add some comments noting that the LinkResult API has this restriction, etc. (It'd be odd to ever try to run these in different threads, but folks have tried odder things!) In general, thread locals seem like a poor design choice for this. They're best for things like caches, to avoid thread contention, rather than for storing program state, no?

          We could instead perhaps add something like a LinkResolution object that keeps a counter internally that is incremented as the link is resolved. The SPI would pass this around rather than a Path? Is that better than Dhruba's patch?

          Show
          Doug Cutting added a comment - Owen> I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult. I see. Yuck. It wouldn't in fact be entirely encapsulated, since it would assume that all the resolution of a path happens in the same thread. Thus we'd want to mark each of the FileSystem methods that uses a LinkResult final, add some comments noting that the LinkResult API has this restriction, etc. (It'd be odd to ever try to run these in different threads, but folks have tried odder things!) In general, thread locals seem like a poor design choice for this. They're best for things like caches, to avoid thread contention, rather than for storing program state, no? We could instead perhaps add something like a LinkResolution object that keeps a counter internally that is incremented as the link is resolved. The SPI would pass this around rather than a Path? Is that better than Dhruba's patch?
          Hide
          Doug Cutting added a comment -

          > the local file system will want to support the normal dot-relative symlinks but perhaps not the remote one (ie the mounts ).

          Internal symlinks are supported automatically by local FS, at least on unix. And support for remote symlinks on the local FS might be very cool. Unix will gladly let me create symbolics link to anHDFS URI (Try 'ln -s hdfs://foo/bar foo; ls -l foo'). These cannot be resolved by a unix shell, but they could by 'bin/hadoop fs -ls', by MapReduce programs etc. One could, e.g., keep in one's home directory on unix links to the HDFS directories that one commonly uses.

          Show
          Doug Cutting added a comment - > the local file system will want to support the normal dot-relative symlinks but perhaps not the remote one (ie the mounts ). Internal symlinks are supported automatically by local FS, at least on unix. And support for remote symlinks on the local FS might be very cool. Unix will gladly let me create symbolics link to anHDFS URI (Try 'ln -s hdfs://foo/bar foo; ls -l foo'). These cannot be resolved by a unix shell, but they could by 'bin/hadoop fs -ls', by MapReduce programs etc. One could, e.g., keep in one's home directory on unix links to the HDFS directories that one commonly uses.
          Hide
          Sanjay Radia added a comment -

          Owen, my first instinct was to implement it the way you suggested. However, one you add loop detection you may want to consider using the structure similar the one that Dhruba used to share that code. The addition of FOOImpl for each method does clutter the FileSystem code.
          Doug's suggestion of LinkableFileSystem is interesting, Not I don't think this is unique to HDFS. KFS may want to support this,
          BTW the local file system will want to support the normal dot-relative symlinks but perhaps not the remote one (ie the mounts ). So Linkable is not quite the distinction. (but that is just a question of naming).

          Show
          Sanjay Radia added a comment - Owen, my first instinct was to implement it the way you suggested. However, one you add loop detection you may want to consider using the structure similar the one that Dhruba used to share that code. The addition of FOOImpl for each method does clutter the FileSystem code. Doug's suggestion of LinkableFileSystem is interesting, Not I don't think this is unique to HDFS. KFS may want to support this, BTW the local file system will want to support the normal dot-relative symlinks but perhaps not the remote one (ie the mounts ). So Linkable is not quite the distinction. (but that is just a question of naming).
          Hide
          Owen O'Malley added a comment -

          Ok, I realized after I posted that in fact the LinkResult is not abstract.

          Doug> do you think this logic is unique to HDFS?

          Well, currently yes, but in the long term hopefully not. In my proposal, most of the work would happen in LinkResult.

          Doug> Also, the pseudo code above will cause a stack overflow for a circular link.

          I was assuming (and included in the comments) that getNewPath would throw if it was too deeply nested. I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult.

          Show
          Owen O'Malley added a comment - Ok, I realized after I posted that in fact the LinkResult is not abstract. Doug> do you think this logic is unique to HDFS? Well, currently yes, but in the long term hopefully not. In my proposal, most of the work would happen in LinkResult. Doug> Also, the pseudo code above will cause a stack overflow for a circular link. I was assuming (and included in the comments) that getNewPath would throw if it was too deeply nested. I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult.
          Hide
          Doug Cutting added a comment -

          > clients are responsible to resolve links before using it

          That might be a reasonable contract, but correct use of the API would then involve extra RPCs to check links before trying to open them, no? But if it is expected that folks will regularly pass unresolved links, then we shouldn't use an exception, since callers of this method would be expected to immediately handle the case of unresolved links. There's no need for a throw, as there might be for, e.g., FileNotFound, which must be handled at some unknown layer above. Exceptions are a long-distance goto, a control mechanism that should be reserved for cases that require them, where the catcher is unknown.

          Show
          Doug Cutting added a comment - > clients are responsible to resolve links before using it That might be a reasonable contract, but correct use of the API would then involve extra RPCs to check links before trying to open them, no? But if it is expected that folks will regularly pass unresolved links, then we shouldn't use an exception, since callers of this method would be expected to immediately handle the case of unresolved links. There's no need for a throw, as there might be for, e.g., FileNotFound, which must be handled at some unknown layer above. Exceptions are a long-distance goto, a control mechanism that should be reserved for cases that require them, where the catcher is unknown.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > It seems like the only motivation to call this an error is to justify using exceptions to handle links, not that it would in fact be a user or program error to call it in this way.

          It is nothing wrong to assume parameter f a non-linked path. For performance reason, clients are responsible to resolve links before using it since resolving links is expensive, especially for cross-file-system links.

          Suppose we have the following link
          hdfs://clusterA/project/xxx -> hdfs://clusterB/yyy

          If a client submit a job with hdfs://clusterA/project/xxx/input and hdfs://clusterA/project/xxx/output, all the input and output file accesses involve two clusters and require two calls. However, if the client uses hdfs://clusterB/yyy, only one cluster is involved.

          Show
          Tsz Wo Nicholas Sze added a comment - > It seems like the only motivation to call this an error is to justify using exceptions to handle links, not that it would in fact be a user or program error to call it in this way. It is nothing wrong to assume parameter f a non-linked path. For performance reason, clients are responsible to resolve links before using it since resolving links is expensive, especially for cross-file-system links. Suppose we have the following link hdfs://clusterA/project/xxx -> hdfs://clusterB/yyy If a client submit a job with hdfs://clusterA/project/xxx/input and hdfs://clusterA/project/xxx/output, all the input and output file accesses involve two clusters and require two calls. However, if the client uses hdfs://clusterB/yyy, only one cluster is involved.
          Hide
          Doug Cutting added a comment -

          > we could define the API open(Path f) such that the parameter f is supposed to be a non-linked path.

          Does calling something an "error" really make it an "error"? It seems like the only motivation to call this an error is to justify using exceptions to handle links, not that it would in fact be a user or program error to call it in this way. Rather, it would be an expected, documented, supported event, but we'd call it an error anyway, because otherwise we'd have to actually model the functionality in our API.

          Show
          Doug Cutting added a comment - > we could define the API open(Path f) such that the parameter f is supposed to be a non-linked path. Does calling something an "error" really make it an "error"? It seems like the only motivation to call this an error is to justify using exceptions to handle links, not that it would in fact be a user or program error to call it in this way. Rather, it would be an expected, documented, supported event, but we'd call it an error anyway, because otherwise we'd have to actually model the functionality in our API.
          Hide
          Doug Cutting added a comment -

          Owen, do you think this logic is unique to HDFS? If we add support for links to another FileSystem implementation would we want to share this logic with HDFS? If so, could we do it through a common base-class, like LinkableFileSystem. Then FileSystems that never intend to support links could continue to directly implement the end-user methods, while FileSystems that support links would extend the new subclass and implement the LinkResult<T> SPI methods. Is that consistent with your proposal?

          Also, the pseudo code above will cause a stack overflow for a circular link. Shouldn't we catch loops earlier than that? And if the loop-detection logic is non-trivial, won't we end up with either lots of duplicated code or something like the anonymous classes in Dhruba's patch?

          Show
          Doug Cutting added a comment - Owen, do you think this logic is unique to HDFS? If we add support for links to another FileSystem implementation would we want to share this logic with HDFS? If so, could we do it through a common base-class, like LinkableFileSystem. Then FileSystems that never intend to support links could continue to directly implement the end-user methods, while FileSystems that support links would extend the new subclass and implement the LinkResult<T> SPI methods. Is that consistent with your proposal? Also, the pseudo code above will cause a stack overflow for a circular link. Shouldn't we catch loops earlier than that? And if the loop-detection logic is non-trivial, won't we end up with either lots of duplicated code or something like the anonymous classes in Dhruba's patch?
          Hide
          Owen O'Malley added a comment -

          Getting back to the nuts and bolts of this patch, I don't really like the structure of this implementation. I think the LinkResolvers and the myriad anonymous classes that extend it are very hard to read. I'd much rather have something like:

          abstract class LinkResult<T> {
            boolean isLink();
            T getResult();
            FileSystem getFileSystem() throws IOException;
            /** Throws an exception if too deep. */
            Path getNewPath() throws IOException;
          }
          
          class DistributedFileSystem extends FileSystem {
            ...
            public FSDataInputStream open(Path p, int buffer) throws IOException {
               LinkResult<FSDataInputStream> result = dfs.open(p, buffer, verifyChecksum, statistics);
               if (result.isLink()) {
                 return result.getFileSystem().open(result.getPath(), buffer);
               } else {
                 return result.getResult();
               }
            }
          }
          

          It is much easier to follow and understand and forces no changes on FileSystems that don't support links.

          Show
          Owen O'Malley added a comment - Getting back to the nuts and bolts of this patch, I don't really like the structure of this implementation. I think the LinkResolvers and the myriad anonymous classes that extend it are very hard to read. I'd much rather have something like: abstract class LinkResult<T> { boolean isLink(); T getResult(); FileSystem getFileSystem() throws IOException; /** Throws an exception if too deep. */ Path getNewPath() throws IOException; } class DistributedFileSystem extends FileSystem { ... public FSDataInputStream open(Path p, int buffer) throws IOException { LinkResult<FSDataInputStream> result = dfs.open(p, buffer, verifyChecksum, statistics); if (result.isLink()) { return result.getFileSystem().open(result.getPath(), buffer); } else { return result.getResult(); } } } It is much easier to follow and understand and forces no changes on FileSystems that don't support links.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Doug> Links are clearly not programming errors or client code errors.

          It depends on how the API is defined. e.g. we could define the API open(Path f) such that the parameter f is supposed to be a non-linked path. An exception will be thrown if the client codes use it with a linked path.

          Show
          Tsz Wo Nicholas Sze added a comment - Doug> Links are clearly not programming errors or client code errors. It depends on how the API is defined. e.g. we could define the API open(Path f) such that the parameter f is supposed to be a non-linked path. An exception will be thrown if the client codes use it with a linked path.
          Hide
          Doug Cutting added a comment -

          Konstantin> this argument is about the programming style rather than performance or anything else

          It's clearly about both of those. We wish to make a change motivated by performance, and the question is how best to implement that change. We have two proposals on the table, one that is less invasive but uses a discouraged style, and one that involves substantial changes but avoids that style problem. I tried earlier to remove the discussion from style guidelines to the underlying reason for those guidelines -that an API's purpose is to model expected control and data flow-but that argument seems to have caused more confusion than it has helped, so I've mostly reverted to citing style guidelines.

          Konstantin> FileSystem is the first layer for which link values make sense.

          The fact that that layer is fully-resolved and known indicates that links should be a first-class value in the API. All of the other layers you present are part of a single module (HDFS) whose job is to present itself as a FileSystem. If HDFS is implementing the FileSystem API, then FileSystem's datatypes are not alien to HDFS. This is the same twist that Sanjay pursued. Yes, the link URI is to a file outside of that particular HDFS filesystem, but link URIs are normal data that HDFS accepts as parameters (when adding a symbolic link), stores, and returns (when dereferencing a link). So this is not an alien object or an unexpected situation. A link to another filesystem is a sensible, first class value within HDFS.

          Show
          Doug Cutting added a comment - Konstantin> this argument is about the programming style rather than performance or anything else It's clearly about both of those. We wish to make a change motivated by performance, and the question is how best to implement that change. We have two proposals on the table, one that is less invasive but uses a discouraged style, and one that involves substantial changes but avoids that style problem. I tried earlier to remove the discussion from style guidelines to the underlying reason for those guidelines - that an API's purpose is to model expected control and data flow -but that argument seems to have caused more confusion than it has helped, so I've mostly reverted to citing style guidelines. Konstantin> FileSystem is the first layer for which link values make sense. The fact that that layer is fully-resolved and known indicates that links should be a first-class value in the API. All of the other layers you present are part of a single module (HDFS) whose job is to present itself as a FileSystem. If HDFS is implementing the FileSystem API, then FileSystem's datatypes are not alien to HDFS. This is the same twist that Sanjay pursued. Yes, the link URI is to a file outside of that particular HDFS filesystem, but link URIs are normal data that HDFS accepts as parameters (when adding a symbolic link), stores, and returns (when dereferencing a link). So this is not an alien object or an unexpected situation. A link to another filesystem is a sensible, first class value within HDFS.
          Hide
          Konstantin Shvachko added a comment -

          I understand your answer to my question HADOOP-4044#action_12638156 is that
          this argument is about the programming style
          rather than performance or anything else. Or do you call differently?

          Qouting one of the links you provide
          "Checked exceptions : represent invalid conditions in areas outside the immediate control of the program"

          If a program cannot cannot handle some condition it throws a checked exception.
          So the question of whether the value of a link is normal data is actually program layer-dependent.

          • Can it be handled within the name-node. Yes for local links - handle them. No for external links. Throw the exception.
          • Can it be handled by ClientProtocol. No. Throw the exception.
          • Can it be handled by DFSClient. No. Throw the exception.
          • Can it be handled by DistributedFileSystem. No. Throw the exception.
          • Can it be handled by FileSystem. Yes! because FileSystem cashes file system implementations.

          FileSystem is the first layer for which link values make sense.

          Show
          Konstantin Shvachko added a comment - I understand your answer to my question HADOOP-4044#action_12638156 is that this argument is about the programming style rather than performance or anything else. Or do you call differently? Qouting one of the links you provide "Checked exceptions : represent invalid conditions in areas outside the immediate control of the program" If a program cannot cannot handle some condition it throws a checked exception. So the question of whether the value of a link is normal data is actually program layer-dependent. Can it be handled within the name-node. Yes for local links - handle them. No for external links. Throw the exception. Can it be handled by ClientProtocol. No. Throw the exception. Can it be handled by DFSClient. No. Throw the exception. Can it be handled by DistributedFileSystem. No. Throw the exception. Can it be handled by FileSystem. Yes! because FileSystem cashes file system implementations. FileSystem is the first layer for which link values make sense.
          Hide
          Doug Cutting added a comment -

          I mistakenly attributed a comment above to Owen, when it was really Dhruba. Sorry! I should be more careful.

          Show
          Doug Cutting added a comment - I mistakenly attributed a comment above to Owen, when it was really Dhruba. Sorry! I should be more careful.
          Hide
          Doug Cutting added a comment -

          Konstantin> open() does not literally return a boolean, you mean that throwing an exception should be treated as an (boolean) indicator to whether the link should be retrieved, right?

          Yes, that's what I meant. We'd be using an exception to do something that could equivalently done with a boolean.

          > But the methods have always been conditional in that sense, because they are throwing AccessControlException, QuotaExceededException, and FileNotFoundException. And XMountPointException is just another exception of the same nature.

          I think its nature is different. The former are all cases where the long-distance throwing is valuable. These are intended to be caught and processed at a higher level. In the successful execution of a typical application one would not expect to see these. An XMountPointException on the other hand would never be handled at some unknown higher level, it's always handled at a known point. It is expected to occur frequently in the course of a typical application's execution as part of the intended control path.

          Show
          Doug Cutting added a comment - Konstantin> open() does not literally return a boolean, you mean that throwing an exception should be treated as an (boolean) indicator to whether the link should be retrieved, right? Yes, that's what I meant. We'd be using an exception to do something that could equivalently done with a boolean. > But the methods have always been conditional in that sense, because they are throwing AccessControlException, QuotaExceededException, and FileNotFoundException. And XMountPointException is just another exception of the same nature. I think its nature is different. The former are all cases where the long-distance throwing is valuable. These are intended to be caught and processed at a higher level. In the successful execution of a typical application one would not expect to see these. An XMountPointException on the other hand would never be handled at some unknown higher level, it's always handled at a known point. It is expected to occur frequently in the course of a typical application's execution as part of the intended control path.
          Hide
          Doug Cutting added a comment -

          To make my position clear: I am -1 on using exceptions to signal links in the FileSystem SPI. I would also recommend against using them in the Namenode too, but, as I stated before, that's not a public API, and we can clean it up later. I believe we should not use exceptions for control flow there either, but would not veto this patch for that.

          Show
          Doug Cutting added a comment - To make my position clear: I am -1 on using exceptions to signal links in the FileSystem SPI. I would also recommend against using them in the Namenode too, but, as I stated before, that's not a public API, and we can clean it up later. I believe we should not use exceptions for control flow there either, but would not veto this patch for that.
          Hide
          Doug Cutting added a comment -

          Owen> I am not buying (yet) Doug's argument of saying that not-throwing exceptions inside the namenode is a better programming style.

          One could in theory use exceptions to handle any condition. For example, we could, if you attempt to open a directory, return the directory listing in an exception. But, rather, we prefer to use data structures unless exceptions are clearly more appropriate. So, the question is, when are exceptions appropriate?

          A primary advantage of exceptions is their long-distance throwing. The maxim is, "throw early", "catch late", since in most cases its better to pass exceptions through and let the highest layers deal with them. So one sign that exceptions are appropriate is when you don't know who should handle them. That is not the case here. There's only one place where this is intended to be caught, in boilerplate methods of FileSystem.java.

          In http://www.onjava.com/pub/a/onjava/2003/11/19/exceptions.html they list just three cases where exceptions are appropriate: programming errors, client code errors and resource failures. Links are clearly not programming errors or client code errors. But are they resource failures? Examples of resource failures are things like network timeouts, out of memory, lack of permission, etc. These generally concern resources that are out of the direct control of the application, and are hence present unexpected outcomes. Reinforcing that interpretation one finds sentences like, "This is a misuse of the idea of exceptions, which are meant only for defects or for items outside the direct control of the program.", in one of the links I provided earlier. So, are links a condition outside the direct control of the program? Are they external resources? I don't see that. The same component that implements open() also directly implements the storage of the link. So the fact that something is a link is entirely within the filesystem's domain. So, again, using exceptions does not seem justified here.

          Sanjay has argued that it is abnormal for a filesystem to link to a different filesystem. This misses the intended sense of "normal". The value of a link is normal data, under the control of the filesystem implementation, regardless of whether it points to a different filesystem. As I tried to elaborate above, "abnormal", with regards to exceptions, is defined as out of the direct control of the module, or in error. One could reasonably call directories as "abnormal" in some sense-they're not files-but they are not errors nor are they conditions out of the control of the application, nor are they issues that are best handled at some unknown spot higher on the stack. Directories, like links, are thus best modelled directly by the API, not as exceptions.

          The primary argument folks have provided for the superiority of exceptions is that they better preserve the current service provider interface, and that this interface more obviously describes the actions in question. It does describe the non-link part of these actions more concisely, but it hides the link part, so preserving that API is not a clear win.

          This proposed use of exceptions in fact seems to me a poster child for how not to use exceptions. If we go down this route, I don't see how we'll be able to argue against exceptions in other places. Non-local control is not required. No error is involved. External conditions are not involved. None of the standard reasons for using exceptions are met. Exceptions are used here purely as a convenient control flow mechanism. It's effectively setjmp/longjmp, whose use we should not encourage.

          Owen> we aren't changing the FileSystem API at all.

          We're not changing the public API much, but we are changing the service provider API substantially. We'd like most FileSystems to support links, and to do so they'll need to change what many methods return. It would be nice to be able to do this back-compatibly, without altering the existing API much. But I don't see a way to do that and still achieve our performance goals without setting a bad precedent for the appropriate use of exceptions.

          I'm not trying to be an uncompromising ass here. ("You don't have to try, it comes naturally", I hear you say.) If you look at the early history of this issue, you'll see that I only with reluctance and skepticism originally outlined the return type approach, because it is clearly a big, disruptive change that's hard to justify. But I really can see no other way that doesn't set precedents that we cannot live with as a project. I'd love for someone to provide a "hail mary" solution, that leaves the API's return types mostly alone, performs well and doesn't set a bad example, but until that happens I don't see an alternative. (Or for someone to convince me that this is actually a reasonable, appropriate, intended use of exceptions.) But until then, I think we just need to accept that FileSystem's SPI must get more complex in order to efficiently support links.

          Show
          Doug Cutting added a comment - Owen> I am not buying (yet) Doug's argument of saying that not-throwing exceptions inside the namenode is a better programming style. One could in theory use exceptions to handle any condition. For example, we could, if you attempt to open a directory, return the directory listing in an exception. But, rather, we prefer to use data structures unless exceptions are clearly more appropriate. So, the question is, when are exceptions appropriate? A primary advantage of exceptions is their long-distance throwing. The maxim is, "throw early", "catch late", since in most cases its better to pass exceptions through and let the highest layers deal with them. So one sign that exceptions are appropriate is when you don't know who should handle them. That is not the case here. There's only one place where this is intended to be caught, in boilerplate methods of FileSystem.java. In http://www.onjava.com/pub/a/onjava/2003/11/19/exceptions.html they list just three cases where exceptions are appropriate: programming errors, client code errors and resource failures. Links are clearly not programming errors or client code errors. But are they resource failures? Examples of resource failures are things like network timeouts, out of memory, lack of permission, etc. These generally concern resources that are out of the direct control of the application, and are hence present unexpected outcomes. Reinforcing that interpretation one finds sentences like, "This is a misuse of the idea of exceptions, which are meant only for defects or for items outside the direct control of the program.", in one of the links I provided earlier. So, are links a condition outside the direct control of the program? Are they external resources? I don't see that. The same component that implements open() also directly implements the storage of the link. So the fact that something is a link is entirely within the filesystem's domain. So, again, using exceptions does not seem justified here. Sanjay has argued that it is abnormal for a filesystem to link to a different filesystem. This misses the intended sense of "normal". The value of a link is normal data, under the control of the filesystem implementation, regardless of whether it points to a different filesystem. As I tried to elaborate above, "abnormal", with regards to exceptions, is defined as out of the direct control of the module, or in error. One could reasonably call directories as "abnormal" in some sense- they're not files -but they are not errors nor are they conditions out of the control of the application, nor are they issues that are best handled at some unknown spot higher on the stack. Directories, like links, are thus best modelled directly by the API, not as exceptions. The primary argument folks have provided for the superiority of exceptions is that they better preserve the current service provider interface, and that this interface more obviously describes the actions in question. It does describe the non-link part of these actions more concisely, but it hides the link part, so preserving that API is not a clear win. This proposed use of exceptions in fact seems to me a poster child for how not to use exceptions. If we go down this route, I don't see how we'll be able to argue against exceptions in other places. Non-local control is not required. No error is involved. External conditions are not involved. None of the standard reasons for using exceptions are met. Exceptions are used here purely as a convenient control flow mechanism. It's effectively setjmp/longjmp, whose use we should not encourage. Owen> we aren't changing the FileSystem API at all. We're not changing the public API much, but we are changing the service provider API substantially. We'd like most FileSystems to support links, and to do so they'll need to change what many methods return. It would be nice to be able to do this back-compatibly, without altering the existing API much. But I don't see a way to do that and still achieve our performance goals without setting a bad precedent for the appropriate use of exceptions. I'm not trying to be an uncompromising ass here. ("You don't have to try, it comes naturally", I hear you say.) If you look at the early history of this issue, you'll see that I only with reluctance and skepticism originally outlined the return type approach, because it is clearly a big, disruptive change that's hard to justify. But I really can see no other way that doesn't set precedents that we cannot live with as a project. I'd love for someone to provide a "hail mary" solution, that leaves the API's return types mostly alone, performs well and doesn't set a bad example, but until that happens I don't see an alternative. (Or for someone to convince me that this is actually a reasonable, appropriate, intended use of exceptions.) But until then, I think we just need to accept that FileSystem's SPI must get more complex in order to efficiently support links.
          Hide
          dhruba borthakur added a comment -

          From my perspective, I am not buying (yet) Doug's argument of saying that not-throwing exceptions inside the namenode is a better programming style. If there is a proposal to change all internal namenode methods to return object-return-types instead of throwing UnresolvedPathException, then I am against it.

          However, for the RPC interfaces, I like Doug's programming style of returning object-as-return-types rather than throwing exceptions, especially in the case of symbolic links.

          Show
          dhruba borthakur added a comment - From my perspective, I am not buying (yet) Doug's argument of saying that not-throwing exceptions inside the namenode is a better programming style. If there is a proposal to change all internal namenode methods to return object-return-types instead of throwing UnresolvedPathException, then I am against it. However, for the RPC interfaces, I like Doug's programming style of returning object-as-return-types rather than throwing exceptions, especially in the case of symbolic links.
          Hide
          Owen O'Malley added a comment -

          Doug,
          I don't see your point. Clearly this is not black and white at all or it wouldn't have caused so much disagreement among the developers. I still think the core disagreement is whether hitting a link is enough of an exception to be reasonably implemented as a Java exception. Most of the rest of it I consider supporting arguments. smile

          In particular, we aren't changing the FileSystem API at all. No client code should be able to detect the difference between the versions before and after. What is changing is the RPC layer and I agree that we effectively have a tagged union between the link result and the actual result. I still don't think either is very nice looking, which is why I was falling back to supporting arguments.

          I think defining result types will give us significant advantages going forward.
          1. It is easy to add additional information in a forward/backward compatible manner.
          2. It is declared and strongly type checked.
          3. It doesn't throw exceptions. grin

          Show
          Owen O'Malley added a comment - Doug, I don't see your point. Clearly this is not black and white at all or it wouldn't have caused so much disagreement among the developers. I still think the core disagreement is whether hitting a link is enough of an exception to be reasonably implemented as a Java exception. Most of the rest of it I consider supporting arguments. smile In particular, we aren't changing the FileSystem API at all. No client code should be able to detect the difference between the versions before and after. What is changing is the RPC layer and I agree that we effectively have a tagged union between the link result and the actual result. I still don't think either is very nice looking, which is why I was falling back to supporting arguments. I think defining result types will give us significant advantages going forward. 1. It is easy to add additional information in a forward/backward compatible manner. 2. It is declared and strongly type checked. 3. It doesn't throw exceptions. grin
          Hide
          Konstantin Shvachko added a comment -

          So the question is why this approach is not acceptable for you (if it is not)?
          Is it because the programming style is not good or is it the performance or is it something else?
          I am sorry that could not understand that from your previous comments.

          Show
          Konstantin Shvachko added a comment - So the question is why this approach is not acceptable for you (if it is not)? Is it because the programming style is not good or is it the performance or is it something else? I am sorry that could not understand that from your previous comments.
          Hide
          Konstantin Shvachko added a comment -

          > All of these methods are now conditional. They now either return the declared value or a boolean

          To make sure we are on the same page - open() does not literally return a boolean, you mean that throwing an exception should be treated as an (boolean) indicator to whether the link should be retrieved, right?
          But the methods have always been conditional in that sense, because they are throwing AccessControlException, QuotaExceededException, and FileNotFoundException. And XMountPointException is just another exception of the same nature.
          I do not see it as a fundamental change to the FileSystem API.

          > When I referred to doubling the number of RPCs I was contrasting with the naive approach

          I think everybody agrees that the naive approach is unacceptable.

          > the link value would not be returned directly with the call to open() but would require a second RPC, right?

          Correct. The point is that there is no degradation of performance in regular case, which is when there are no links or the link is local, but there is an extra rpc when the link points to another cluster.
          And I do not think we should optimize it exactly because this leads to the fundamental change of the api, just as you describe.
          And because an application can avoid this inefficiency by directly resolving the link before calling open() if it knows about the mount point.

          Show
          Konstantin Shvachko added a comment - > All of these methods are now conditional. They now either return the declared value or a boolean To make sure we are on the same page - open() does not literally return a boolean, you mean that throwing an exception should be treated as an (boolean) indicator to whether the link should be retrieved, right? But the methods have always been conditional in that sense , because they are throwing AccessControlException, QuotaExceededException, and FileNotFoundException. And XMountPointException is just another exception of the same nature. I do not see it as a fundamental change to the FileSystem API. > When I referred to doubling the number of RPCs I was contrasting with the naive approach I think everybody agrees that the naive approach is unacceptable. > the link value would not be returned directly with the call to open() but would require a second RPC, right? Correct. The point is that there is no degradation of performance in regular case, which is when there are no links or the link is local, but there is an extra rpc when the link points to another cluster. And I do not think we should optimize it exactly because this leads to the fundamental change of the api, just as you describe. And because an application can avoid this inefficiency by directly resolving the link before calling open() if it knows about the mount point.
          Hide
          Doug Cutting added a comment -

          > why don't we accept the solution that does not require any fundamental changes to the FileSystem API

          But it does change the API fundamentally. All of these methods are now conditional. They now either return the declared value or a boolean indicating that the declared value is to be ignored and that one should follow up with a request for the link value. That's a substantial change, that shouldn't be cloaked in exception handling.

          We might instead consider changing the contract of these methods to permit the return of a null value, and interpret a null value as the need to request the link value. Unfortunately that won't work with the boolean methods, and it's a fragile approach, since folks might forget to check for null. It's better to make it explicit with a datastructure.

          > My proposal does "not double the number of RPCs per file operation".

          I did not mean to imply that it did. I'm sorry that you thought I did. When I referred to doubling the number of RPCs I was contrasting with the naive approach of calling resolveLink() or somesuch explicitly before open(), so that open() is never passed a path that contains a link, as this is implemented in Unix and other VFSs.

          Your approach would add an RPC whenever a symbolic link is encountered over the other approaches, since the link value would not be returned directly with the call to open() but would require a second RPC, right?

          > I keep wondering what is your selective criteria for ignoring some arguments and answering other.

          What arguments am I ignoring? I am sorry if you feel I've ignored some of your arguments. Please highlight them, if you like.

          Show
          Doug Cutting added a comment - > why don't we accept the solution that does not require any fundamental changes to the FileSystem API But it does change the API fundamentally. All of these methods are now conditional. They now either return the declared value or a boolean indicating that the declared value is to be ignored and that one should follow up with a request for the link value. That's a substantial change, that shouldn't be cloaked in exception handling. We might instead consider changing the contract of these methods to permit the return of a null value, and interpret a null value as the need to request the link value. Unfortunately that won't work with the boolean methods, and it's a fragile approach, since folks might forget to check for null. It's better to make it explicit with a datastructure. > My proposal does "not double the number of RPCs per file operation". I did not mean to imply that it did. I'm sorry that you thought I did. When I referred to doubling the number of RPCs I was contrasting with the naive approach of calling resolveLink() or somesuch explicitly before open(), so that open() is never passed a path that contains a link, as this is implemented in Unix and other VFSs. Your approach would add an RPC whenever a symbolic link is encountered over the other approaches, since the link value would not be returned directly with the call to open() but would require a second RPC, right? > I keep wondering what is your selective criteria for ignoring some arguments and answering other. What arguments am I ignoring? I am sorry if you feel I've ignored some of your arguments. Please highlight them, if you like.
          Hide
          Konstantin Shvachko added a comment -

          Doug> Instead of direct functions that map parameters to values, we wish to implement conditional functions, that do one of two things, and indicate in their response which path was taken.

          My proposal allows to keep the code within the current API style: "direct functions that map parameters to values".
          HADOOP-4044#action_12637702
          HADOOP-4044#action_12637610
          HADOOP-4044#action_12637240

          Doug> The alternative is to double or more the number of RPCs per file operation, which is unacceptable.

          My proposal does "not double the number of RPCs per file operation".
          HADOOP-4044#action_12637702

          So if this dispute is not about programming style, then why don't we accept the solution that does not require any fundamental changes to the FileSystem API and does not have performance penalties for the regular case? Or is it nevertheless about the programming style?
          I keep wondering what is your selective criteria for ignoring some arguments and answering other.

          Show
          Konstantin Shvachko added a comment - Doug> Instead of direct functions that map parameters to values, we wish to implement conditional functions, that do one of two things, and indicate in their response which path was taken. My proposal allows to keep the code within the current API style: "direct functions that map parameters to values". HADOOP-4044#action_12637702 HADOOP-4044#action_12637610 HADOOP-4044#action_12637240 Doug> The alternative is to double or more the number of RPCs per file operation, which is unacceptable. My proposal does "not double the number of RPCs per file operation". HADOOP-4044#action_12637702 So if this dispute is not about programming style, then why don't we accept the solution that does not require any fundamental changes to the FileSystem API and does not have performance penalties for the regular case? Or is it nevertheless about the programming style? I keep wondering what is your selective criteria for ignoring some arguments and answering other.
          Hide
          Sanjay Radia added a comment -

          Assuming that the vote for not using exceptions for rpc interface passes (I have voted the +0), should we consider modifying the
          rpc interface a little further:

          • return all HDFS exceptions as part of the return status rather then as exceptions. The only exception returned is the IOException when the rpc fails for some reason.
          • this would make the approach consistent - all status and information returned from the NN is part of the XXXResult object for each method XXX.
          Show
          Sanjay Radia added a comment - Assuming that the vote for not using exceptions for rpc interface passes (I have voted the +0), should we consider modifying the rpc interface a little further: return all HDFS exceptions as part of the return status rather then as exceptions. The only exception returned is the IOException when the rpc fails for some reason. this would make the approach consistent - all status and information returned from the NN is part of the XXXResult object for each method XXX.
          Hide
          Doug Cutting added a comment -

          Stepping back a bit, I don't believe this dispute is fundamentally about programming style. That, like Owen's arguments about RPCs, are supporting arguments. The primary issue is that we're changing the FileSystem API in a fundamental way. Instead of direct functions that map parameters to values, we wish to implement conditional functions, that do one of two things, and indicate in their response which path was taken. This is a very different style, one that is forced on us for performance reasons. The alternative is to double or more the number of RPCs per file operation, which is unacceptable.

          If we're using a different style, then it should look different. It should not look like the direct mapping of the existing API. We should not attempt to hide this, as that will lead to confusion. We should make it apparent to readers of the code that these methods return not a single type of value, but one of two kinds of values. This is not a matter of exception style, it's a matter of expressiveness and readability. Code should make clear what it does. Style guidelines derive from this. The reason exceptions are discouraged for normal control flow is that they hide control flow. We are factoring our FileSystem API in a different way, and that factoring should be apparent to all as an obvious, natural attribute of the API.

          Show
          Doug Cutting added a comment - Stepping back a bit, I don't believe this dispute is fundamentally about programming style. That, like Owen's arguments about RPCs, are supporting arguments. The primary issue is that we're changing the FileSystem API in a fundamental way. Instead of direct functions that map parameters to values, we wish to implement conditional functions, that do one of two things, and indicate in their response which path was taken. This is a very different style, one that is forced on us for performance reasons. The alternative is to double or more the number of RPCs per file operation, which is unacceptable. If we're using a different style, then it should look different. It should not look like the direct mapping of the existing API. We should not attempt to hide this, as that will lead to confusion. We should make it apparent to readers of the code that these methods return not a single type of value, but one of two kinds of values. This is not a matter of exception style, it's a matter of expressiveness and readability. Code should make clear what it does. Style guidelines derive from this. The reason exceptions are discouraged for normal control flow is that they hide control flow. We are factoring our FileSystem API in a different way, and that factoring should be apparent to all as an obvious, natural attribute of the API.
          Hide
          Sanjay Radia added a comment -

          Oops a few of my comments got deleted by mistake as I was doing cut/paste.

          -------------

          >My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures. Do you disagree with this? Or do you think that links are not normal functionality?
          >You keep arguing that "exceptions can be used for recoverable conditions". Indeed they can. But is a link really best modeled as a "recoverable condition"? This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures.

          The way you have worded your questions makes it very hard for me to convey the alternative.
          So I response in the best way I can (and you will not find the answer acceptable since I have said some of this before).
          I see following a symbolic link as an exceptional condition but not an error. The NN prefers not to follow the remote symlinks.
          It is however a condition that can be recovered by the client side.
          Either the use of exceptions or data types and parameters are ways to deal with this.
          I believe the use of exceptions in this situation is acceptable and further the better of the two alternative.
          Further I believe it leaves the methods signatures intact and they continue to convey the function they previously did by their name and by the types of the in and out parameters. I like that property of the solution. The alternate solution forces me to create a whole bunch of new data types or overload existing one. If exceptions were not available I would use the alternate solution.

          I don't like your use of term "normal control flow".That term was used in one of your references for an example where and outOfBound exception was used to get out of a for loop at the array index boundary. Clearly use of exceptions to exit out of a for loop for out of bounds is blasphemous.
          So the issue is that if I am forced to argue using the words normal or not normal it makes my point of view ldiotic - the use of those words
          in that example do not apply in this context.

          ----------------

          Show
          Sanjay Radia added a comment - Oops a few of my comments got deleted by mistake as I was doing cut/paste. ------------- >My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures. Do you disagree with this? Or do you think that links are not normal functionality? >You keep arguing that "exceptions can be used for recoverable conditions". Indeed they can. But is a link really best modeled as a "recoverable condition"? This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures. The way you have worded your questions makes it very hard for me to convey the alternative. So I response in the best way I can (and you will not find the answer acceptable since I have said some of this before). I see following a symbolic link as an exceptional condition but not an error. The NN prefers not to follow the remote symlinks. It is however a condition that can be recovered by the client side. Either the use of exceptions or data types and parameters are ways to deal with this. I believe the use of exceptions in this situation is acceptable and further the better of the two alternative. Further I believe it leaves the methods signatures intact and they continue to convey the function they previously did by their name and by the types of the in and out parameters. I like that property of the solution. The alternate solution forces me to create a whole bunch of new data types or overload existing one. If exceptions were not available I would use the alternate solution. I don't like your use of term "normal control flow".That term was used in one of your references for an example where and outOfBound exception was used to get out of a for loop at the array index boundary. Clearly use of exceptions to exit out of a for loop for out of bounds is blasphemous. So the issue is that if I am forced to argue using the words normal or not normal it makes my point of view ldiotic - the use of those words in that example do not apply in this context. ----------------
          Hide
          Sanjay Radia added a comment -

          Doug says>>>>>>
          >> Doug should we pay the cost and do it the "right way" now?
          >Yes, of course.
          >> I prefer to use exceptions [ ... ]
          >But why? You've said it's "cleaner" and "simpler". I don't follow your argument. "Simpler" could mean fewer lines of code. I don't think it will actually change the lines of code much. Changing the return types will change more lines of code, hence the change is more complex, but the resulting code will probably be around the same size, regardless of which approach is taken. "Cleaner" is subjective, like "better", and requires elaboration.
          >My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures. Do you disagree with this? Or do you think that links are not normal functionality?
          >You keep arguing that "exceptions can be used for recoverable conditions". Indeed they can. But is a link really best modeled as a "recoverable condition"? This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures.
          >Please provide a reasoned explanation for your disagreement. Otherwise it is hard to move forward.

          Two part answer below: (1) response to the above, and (2) in the sprint of moving forward.

          This a very one-sided evaluation of the discussion that has occurred above.
          First, I have said that we have a difference of opinion (essentially granting you that you have a reasonable point of view).
          I have not come out and said that you are not offering any reasonable arguments.
          Has reasonable arguments been offered for my side? Yes. you are just not willing to see it because you think the use of exceptions in this
          situation is blasphemous.
          I am not the only one who finds the the use of exceptions clean and elegant and the alternate view not so; there are others who have
          said similar things.

          • Konstantine
          • Raghu
          • Dhruba - "Dhruba> Thus, exceptions could be used here [NN]. This keeps most of the HDFS code clean and elegant. " He however argues that we should not use exceptions for RPC. But he does state that use of exceptions make the NN clean and elegant.

          Hence many have provided counter arguments to your view.
          Myself and other have tried to explain what we have meant by "elegant and simple".
          Since several of us are saying the same thing please grant us that we are seeing something you are not even if we are all wrong.
          I agree that terms like simple and elegant are not as precise as one would like but then neither are "hacky" or "perversion"

          What we have is a difference in opinion - both of are convinced that that the other approach is not good or is invalid. (BTW i don't believe
          your approach is invalid and wrong, I just prefer mine.)

          In the spirit of moving forward:

          • symlink vs link - I am okay with either but prefer symlink.
          • use of exceptions inside NN
            +1.
            I believe this keeps the code and the internal interfaces "simple and elegant" (others such as Dhruba have said the same).
            It has nothing to do with how much code needs to get changed.
            Lets agree to disagree here.
            Doug if you want to change the NN code to not use exceptions, I would like to have an opportunity to sit with you and make one last attempt
            to help you see my point of view or get convinced by yours. Eitherway, we can discuss this over a beer someday (I will buy).
          • Not using exceptions for RPC interface - use the patch approach or Owen's variation on it.
            I can live with this.
            I can justify not using exceptions here to some degree because the impact is limited and also some RPC systems do not have exceptions ( not a very strong argument because we already use exceptions in our RPC interface)
            I am okay with the way the patch has done it or the alternative that Owen has offered.
            (Its a six of this half a dozen of the other). As I noted, Owen's approach is potentially better for evolution and versioning.
          • Having the NN compose the symlink and the rest of the path
            I can live with either - I prefer to keep them separate and have the client side do the composition as information is preserved and furthermore it allows us to deal other ways of composing the two parts (I have no concrete examples of needing this at this stage).
          Show
          Sanjay Radia added a comment - Doug says>>>>>> >> Doug should we pay the cost and do it the "right way" now? >Yes, of course. >> I prefer to use exceptions [ ... ] >But why? You've said it's "cleaner" and "simpler". I don't follow your argument. "Simpler" could mean fewer lines of code. I don't think it will actually change the lines of code much. Changing the return types will change more lines of code, hence the change is more complex, but the resulting code will probably be around the same size, regardless of which approach is taken. "Cleaner" is subjective, like "better", and requires elaboration. >My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures. Do you disagree with this? Or do you think that links are not normal functionality? >You keep arguing that "exceptions can be used for recoverable conditions". Indeed they can. But is a link really best modeled as a "recoverable condition"? This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures. >Please provide a reasoned explanation for your disagreement. Otherwise it is hard to move forward. Two part answer below: (1) response to the above, and (2) in the sprint of moving forward. This a very one-sided evaluation of the discussion that has occurred above. First, I have said that we have a difference of opinion (essentially granting you that you have a reasonable point of view). I have not come out and said that you are not offering any reasonable arguments. Has reasonable arguments been offered for my side? Yes. you are just not willing to see it because you think the use of exceptions in this situation is blasphemous. I am not the only one who finds the the use of exceptions clean and elegant and the alternate view not so; there are others who have said similar things. Konstantine Raghu Dhruba - "Dhruba> Thus, exceptions could be used here [NN] . This keeps most of the HDFS code clean and elegant. " He however argues that we should not use exceptions for RPC. But he does state that use of exceptions make the NN clean and elegant. Hence many have provided counter arguments to your view. Myself and other have tried to explain what we have meant by "elegant and simple". Since several of us are saying the same thing please grant us that we are seeing something you are not even if we are all wrong. I agree that terms like simple and elegant are not as precise as one would like but then neither are "hacky" or "perversion" What we have is a difference in opinion - both of are convinced that that the other approach is not good or is invalid. (BTW i don't believe your approach is invalid and wrong, I just prefer mine.) In the spirit of moving forward: symlink vs link - I am okay with either but prefer symlink. use of exceptions inside NN +1. I believe this keeps the code and the internal interfaces "simple and elegant" (others such as Dhruba have said the same). It has nothing to do with how much code needs to get changed. Lets agree to disagree here. Doug if you want to change the NN code to not use exceptions, I would like to have an opportunity to sit with you and make one last attempt to help you see my point of view or get convinced by yours. Eitherway, we can discuss this over a beer someday (I will buy). Not using exceptions for RPC interface - use the patch approach or Owen's variation on it. I can live with this. I can justify not using exceptions here to some degree because the impact is limited and also some RPC systems do not have exceptions ( not a very strong argument because we already use exceptions in our RPC interface) I am okay with the way the patch has done it or the alternative that Owen has offered. (Its a six of this half a dozen of the other). As I noted, Owen's approach is potentially better for evolution and versioning. Having the NN compose the symlink and the rest of the path I can live with either - I prefer to keep them separate and have the client side do the composition as information is preserved and furthermore it allows us to deal other ways of composing the two parts (I have no concrete examples of needing this at this stage).
          Hide
          Konstantin Shvachko added a comment -

          Doug> whether to use an exception or a data structure to pass the return value.

          !!! No need to pass the return value with the exception !!!

          public FileHandle open(Path path) {
            while(path != null) {
              try {
                return open(path); // succeeds if there are no links in the path
              } catch (XMountPointException eX) { // called only if the mount point is crossed
                path = getLinkTarget(path);
              }
            }
          }
          

          There is no need to change api with this. Only add a new method getLinkTarget().
          And XMountPointException does not encode or pass any values.

          Show
          Konstantin Shvachko added a comment - Doug> whether to use an exception or a data structure to pass the return value. !!! No need to pass the return value with the exception !!! public FileHandle open(Path path) { while (path != null ) { try { return open(path); // succeeds if there are no links in the path } catch (XMountPointException eX) { // called only if the mount point is crossed path = getLinkTarget(path); } } } There is no need to change api with this. Only add a new method getLinkTarget(). And XMountPointException does not encode or pass any values.
          Hide
          Raghu Angadi added a comment -

          I didn't mean to say implementation was wrong. I meant that implementation is a result the interface.

          Show
          Raghu Angadi added a comment - I didn't mean to say implementation was wrong. I meant that implementation is a result the interface.
          Hide
          Doug Cutting added a comment -

          > So is replacing 'return false;' with 'return new FSLinkBoolean(false)'.

          That's not a fair comparison. I was talking about shorter names versus longer names. You are talking about throwing exceptions versus adding a bit of code to return a value. Both involve more-or-less code, but in one case that's the only issue, in the other there's a lot more to it.

          Show
          Doug Cutting added a comment - > So is replacing 'return false;' with 'return new FSLinkBoolean(false)'. That's not a fair comparison. I was talking about shorter names versus longer names. You are talking about throwing exceptions versus adding a bit of code to return a value. Both involve more-or-less code, but in one case that's the only issue, in the other there's a lot more to it.
          Hide
          Owen O'Malley added a comment -

          I don't think it is reasonable at all to use an extra rpc to always test the existence of links.

          To me it boils down to three things:
          1. Hitting a link in a traversal isn't an error or an exceptional situation.
          2. Exceptions are not handled well by rpc.
          3. The logging by the rpc package is confusing for non-system exceptions.

          It seems like most of the debate seems to be on 1, but it almost entirely a style issue. You either think it is an appropriate use of exceptions or not. I haven't seen any people switching sides based on the arguments on this jira.

          On #2, currently all exceptions are stringified and sent as RemoteException. That isn't very useful for throwing and catching. If we switch to Thrift in the future, it would again, be unclear what the exception semantics are. Return types are well understood and under Thrift are versioned, which would be a very good thing.

          On #3, we can probably use minor changes to rpc to avoid it. (A marker interface that says not to log it when thrown across RPC? That would still be awkward for java exceptions that we are re-using.)

          Show
          Owen O'Malley added a comment - I don't think it is reasonable at all to use an extra rpc to always test the existence of links. To me it boils down to three things: 1. Hitting a link in a traversal isn't an error or an exceptional situation. 2. Exceptions are not handled well by rpc. 3. The logging by the rpc package is confusing for non-system exceptions. It seems like most of the debate seems to be on 1, but it almost entirely a style issue. You either think it is an appropriate use of exceptions or not. I haven't seen any people switching sides based on the arguments on this jira. On #2, currently all exceptions are stringified and sent as RemoteException. That isn't very useful for throwing and catching. If we switch to Thrift in the future, it would again, be unclear what the exception semantics are. Return types are well understood and under Thrift are versioned, which would be a very good thing. On #3, we can probably use minor changes to rpc to avoid it. (A marker interface that says not to log it when thrown across RPC? That would still be awkward for java exceptions that we are re-using.)
          Hide
          Raghu Angadi added a comment -

          > Why can't HDFS change method names?

          Right. They can.

          Show
          Raghu Angadi added a comment - > Why can't HDFS change method names? Right. They can.
          Hide
          Raghu Angadi added a comment -

          >[...] but a bit verbose in real code.

          So is replacing 'return false;' with 'return new FSLinkBoolean(false)'.

          And as developers we have to look at it and deal with it everyday .

          Show
          Raghu Angadi added a comment - > [...] but a bit verbose in real code. So is replacing 'return false;' with 'return new FSLinkBoolean(false)'. And as developers we have to look at it and deal with it everyday .
          Hide
          Doug Cutting added a comment -

          > I was only wondering why we are ok with compromise in one aspect and not about the other.

          What compromise is this?

          > it makes perfect sense. My objection has been calling it open() and returning link some times.

          So it was all about the method name? Okay, then please suggest new method names and return type names, rather than suggesting we use exceptions to return normal data. I personally find the 'xOry' names useful in pseudo code, when I'm trying to make a point, but a bit verbose in real code. We don't call FileStatus NameAndModifiedDateAndOwnerAnd....

          > This also allows HDFS to use exceptions since we don't want to change the names.

          What? I don't follow your logic here. Why can't HDFS change method names?

          Show
          Doug Cutting added a comment - > I was only wondering why we are ok with compromise in one aspect and not about the other. What compromise is this? > it makes perfect sense. My objection has been calling it open() and returning link some times. So it was all about the method name? Okay, then please suggest new method names and return type names, rather than suggesting we use exceptions to return normal data. I personally find the 'xOry' names useful in pseudo code, when I'm trying to make a point, but a bit verbose in real code. We don't call FileStatus NameAndModifiedDateAndOwnerAnd.... > This also allows HDFS to use exceptions since we don't want to change the names. What? I don't follow your logic here. Why can't HDFS change method names?
          Hide
          Raghu Angadi added a comment -

          > Do you agree with this assumption, or do you feel we should examine it more?

          I agree with assumption. I was only wondering why we are ok with compromise in one aspect and not about the other.

          > I thought the point of contention is not the name of the method or return value but whether to use an exception or a data structure to pass the return value. (That pseudo code also referred to FileHandle, which is also not in the patch.)

          For me at least, the issue was returning something that does not suite the api. If we have a member called 'getNextLinkOrOpen()' that returns next link or opens a file.. it makes perfect sense. My objection has been calling it open() and returning link some times.

          Though I don't like the new names, +1 for current patch if the names of the members is changed to reflect what it does. This also allows HDFS to use exceptions since we don't want to change the names.

          Show
          Raghu Angadi added a comment - > Do you agree with this assumption, or do you feel we should examine it more? I agree with assumption. I was only wondering why we are ok with compromise in one aspect and not about the other. > I thought the point of contention is not the name of the method or return value but whether to use an exception or a data structure to pass the return value. (That pseudo code also referred to FileHandle, which is also not in the patch.) For me at least, the issue was returning something that does not suite the api. If we have a member called 'getNextLinkOrOpen()' that returns next link or opens a file.. it makes perfect sense. My objection has been calling it open() and returning link some times. Though I don't like the new names, +1 for current patch if the names of the members is changed to reflect what it does. This also allows HDFS to use exceptions since we don't want to change the names.
          Hide
          Doug Cutting added a comment -

          > so.. this is a performance compromise. But the arguments till now have been about "perfect" ...

          This issue has long been performance-related. The assumption is that it is unacceptable for the addition of symlinks to double or more the number of RPCs for common HDFS operations when no links are present. I have noted this assumption on several occasions and no one has disputed it. Do you agree with this assumption, or do you feel we should examine it more?

          > in the patch it is not clalled 'nextLinkOrOpenImpl()'.. it is just called 'openImpl()'.

          I thought the point of contention is not the name of the method or return value but whether to use an exception or a data structure to pass the return value. (That pseudo code also referred to FileHandle, which is also not in the patch.) Would you like to propose better names for the methods and objects in the patch? I am not wedded to the names used in the patch, but perhaps we should resolve the exception-related issue first?

          > This is obviously an exaggerated example but hopefully makes a point.

          I'm not sure what the point is. If the RPC system wished to return some generic data for all protocols then it could not force a different return type, since return types are protocol-specific, so we'd probably need to add a method to the RPC runtime. If the situation is unusual, then an exception would be more appropriate.

          HTTP is perhaps an analogy. It uses the return code to indicate a redirect. URLConnection.html#getInputStream() doesn't throw an exception when it encounters a redirect. It either follows the redirect or returns the text of the redirect page, depending on whether you've set HttpURLConnection#setFollowRedirects. Redirects are part of the protocol. The HTTP return type is complex. That's the model we must move towards if we wish to handle links without adding lots more RPCs.

          Show
          Doug Cutting added a comment - > so.. this is a performance compromise. But the arguments till now have been about "perfect" ... This issue has long been performance-related. The assumption is that it is unacceptable for the addition of symlinks to double or more the number of RPCs for common HDFS operations when no links are present. I have noted this assumption on several occasions and no one has disputed it. Do you agree with this assumption, or do you feel we should examine it more? > in the patch it is not clalled 'nextLinkOrOpenImpl()'.. it is just called 'openImpl()'. I thought the point of contention is not the name of the method or return value but whether to use an exception or a data structure to pass the return value. (That pseudo code also referred to FileHandle, which is also not in the patch.) Would you like to propose better names for the methods and objects in the patch? I am not wedded to the names used in the patch, but perhaps we should resolve the exception-related issue first? > This is obviously an exaggerated example but hopefully makes a point. I'm not sure what the point is. If the RPC system wished to return some generic data for all protocols then it could not force a different return type, since return types are protocol-specific, so we'd probably need to add a method to the RPC runtime. If the situation is unusual, then an exception would be more appropriate. HTTP is perhaps an analogy. It uses the return code to indicate a redirect. URLConnection.html#getInputStream() doesn't throw an exception when it encounters a redirect. It either follows the redirect or returns the text of the redirect page, depending on whether you've set HttpURLConnection#setFollowRedirects. Redirects are part of the protocol. The HTTP return type is complex. That's the model we must move towards if we wish to handle links without adding lots more RPCs.
          Hide
          Raghu Angadi added a comment -

          Lets take another hypothetical case : Our RPC gets rewritten, and for some reason, it wants to inform callers to retry. will all the RPCs calls have a prefix 'retryOr...()' and all the returned values will have extra field to indicate if this needs to be retried? This is obviously an exaggerated example but hopefully makes a point.

          Also I realize that it is hard to be too objective about interfaces...

          Show
          Raghu Angadi added a comment - Lets take another hypothetical case : Our RPC gets rewritten, and for some reason, it wants to inform callers to retry. will all the RPCs calls have a prefix 'retryOr...()' and all the returned values will have extra field to indicate if this needs to be retried? This is obviously an exaggerated example but hopefully makes a point. Also I realize that it is hard to be too objective about interfaces...
          Hide
          Raghu Angadi added a comment -

          > I don't think anyone would argue that the values of isLink() and getLinkTarget()

          Yes. It is of course proper data for 'stat' or FileStatus.

          > But we don't want to make this many calls to our FS implementations, since they're RPCs. So, for performance,
          so.. this is a performance compromise. But the arguments till now have been about "perfect" api/implementation for a clearly imperfect/compromise (but well performing) solution.

          > Why does this suddenly make the that same data non-normal?

          in the patch it is not clalled 'nextLinkOrOpenImpl()'.. it is just called 'openImpl()'.

          Show
          Raghu Angadi added a comment - > I don't think anyone would argue that the values of isLink() and getLinkTarget() Yes. It is of course proper data for 'stat' or FileStatus. > But we don't want to make this many calls to our FS implementations, since they're RPCs. So, for performance, so.. this is a performance compromise. But the arguments till now have been about "perfect" api/implementation for a clearly imperfect/compromise (but well performing) solution. > Why does this suddenly make the that same data non-normal? in the patch it is not clalled 'nextLinkOrOpenImpl()'.. it is just called 'openImpl()'.
          Hide
          Doug Cutting added a comment -

          > Another trueism is that returning unrelated values is API pollution.

          Sure, but the values here are not unrelated. They're core.

          > I guess one end of the opinion is that retuning link is normal and other end thinks it isn't.

          Okay, so, if that's the case, how is it abnormal? The traditional way to model this would be something like:

          public FileHandle open(Path path) {
            for (element in path) {
              FileStatus stat = fsImpl.getStatus(element);
              if (stat.isLink()) {
                return open(stat.getLinkTarget());
              } else if stat.isFile() {
                return fsImpl.open(element);
              }
            }
          

          I don't think anyone would argue that the values of isLink() and getLinkTarget() are not normal data. But we don't want to make this many calls to our FS implementations, since they're RPCs. So, for performance, we want to package that same data in a different way, with something like:

          public FileHandle open(Path path) {
              LinkOrFileHandle linkOrFileHandle = getNextLinkOrOpen(path);
              while (linkOrFileHandle.isLink()) {
                 linkOrFileHandle = getNextLinkOrOpen(linkOrFileHandle.getLink());
              }
              return linkOrFileHandle.getFileHandle();
          }
          

          Why does this suddenly make the that same data non-normal?

          Show
          Doug Cutting added a comment - > Another trueism is that returning unrelated values is API pollution. Sure, but the values here are not unrelated. They're core. > I guess one end of the opinion is that retuning link is normal and other end thinks it isn't. Okay, so, if that's the case, how is it abnormal? The traditional way to model this would be something like: public FileHandle open(Path path) { for (element in path) { FileStatus stat = fsImpl.getStatus(element); if (stat.isLink()) { return open(stat.getLinkTarget()); } else if stat.isFile() { return fsImpl.open(element); } } I don't think anyone would argue that the values of isLink() and getLinkTarget() are not normal data. But we don't want to make this many calls to our FS implementations, since they're RPCs. So, for performance, we want to package that same data in a different way, with something like: public FileHandle open(Path path) { LinkOrFileHandle linkOrFileHandle = getNextLinkOrOpen(path); while (linkOrFileHandle.isLink()) { linkOrFileHandle = getNextLinkOrOpen(linkOrFileHandle.getLink()); } return linkOrFileHandle.getFileHandle(); } Why does this suddenly make the that same data non-normal?
          Hide
          Raghu Angadi added a comment -

          > Returning normal values through exceptions is a perversion.
          hardly anyone disagrees with it. Another trueism is that returning unrelated values is API pollution. I guess one end of the opinion is that retuning link is normal and other end thinks it isn't. The issue is not about returning values vs exception 'in general', but only about this particular case.

          Show
          Raghu Angadi added a comment - > Returning normal values through exceptions is a perversion. hardly anyone disagrees with it. Another trueism is that returning unrelated values is API pollution. I guess one end of the opinion is that retuning link is normal and other end thinks it isn't. The issue is not about returning values vs exception 'in general', but only about this particular case.
          Hide
          Konstantin Shvachko added a comment -

          Dhruba> In current Hadoop RPC, the only way to serialize/deserialize an exception object is to serialize its message string.

          I am not sure my proposal was understood completely.

          • I have an exception XMountPointException, which does not encode path (or any special fields) in its message.
          • If I call getFileStatus(src) and src crosses a mount point the XMountPointException is thrown by the name-node, which is passed through RPC to the application level.
          • The application catches XMountPointException and calls linkSrc = getSymLink(src), which returns the first link on the path.
          • The application then calls getFileStatus(linkSrc).
          • Encoding linkSrc inside XMountPointException would be an optimization: unnecessary here because crossing mount point
            is not common.
          • If an application knows and expects to cross a mount point it cam call getSymLink() avoiding catching XMountPointException.

          Is this the right usage of exceptions?

          Doug> we combined the value of many methods into the value of a single method with a very generic name.

          The point of RPC is to make transparent the types passed and returned by a method.
          As said Object execute(Object request) replaces all PRCs by one generic call.
          It is like sending and receiving blob buffers and casting them into appropriate structures.
          Bad practice in C, unavoidable in Cobol, cannot qualify for a remote procedure call.

          Show
          Konstantin Shvachko added a comment - Dhruba> In current Hadoop RPC, the only way to serialize/deserialize an exception object is to serialize its message string. I am not sure my proposal was understood completely. I have an exception XMountPointException, which does not encode path (or any special fields) in its message. If I call getFileStatus(src) and src crosses a mount point the XMountPointException is thrown by the name-node, which is passed through RPC to the application level. The application catches XMountPointException and calls linkSrc = getSymLink(src) , which returns the first link on the path. The application then calls getFileStatus(linkSrc) . Encoding linkSrc inside XMountPointException would be an optimization: unnecessary here because crossing mount point is not common. If an application knows and expects to cross a mount point it cam call getSymLink() avoiding catching XMountPointException. Is this the right usage of exceptions? Doug> we combined the value of many methods into the value of a single method with a very generic name. The point of RPC is to make transparent the types passed and returned by a method. As said Object execute(Object request) replaces all PRCs by one generic call. It is like sending and receiving blob buffers and casting them into appropriate structures. Bad practice in C, unavoidable in Cobol, cannot qualify for a remote procedure call.
          Hide
          Doug Cutting added a comment -

          > Doug should we pay the cost and do it the "right way" now?

          Yes, of course.

          > I prefer to use exceptions [ ... ]

          But why? You've said it's "cleaner" and "simpler". I don't follow your argument. "Simpler" could mean fewer lines of code. I don't think it will actually change the lines of code much. Changing the return types will change more lines of code, hence the change is more complex, but the resulting code will probably be around the same size, regardless of which approach is taken. "Cleaner" is subjective, like "better", and requires elaboration.

          My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures. Do you disagree with this? Or do you think that links are not normal functionality?

          You keep arguing that "exceptions can be used for recoverable conditions". Indeed they can. But is a link really best modeled as a "recoverable condition"? This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures.

          Please provide a reasoned explanation for your disagreement. Otherwise it is hard to move forward.

          Show
          Doug Cutting added a comment - > Doug should we pay the cost and do it the "right way" now? Yes, of course. > I prefer to use exceptions [ ... ] But why? You've said it's "cleaner" and "simpler". I don't follow your argument. "Simpler" could mean fewer lines of code. I don't think it will actually change the lines of code much. Changing the return types will change more lines of code, hence the change is more complex, but the resulting code will probably be around the same size, regardless of which approach is taken. "Cleaner" is subjective, like "better", and requires elaboration. My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures. Do you disagree with this? Or do you think that links are not normal functionality? You keep arguing that "exceptions can be used for recoverable conditions". Indeed they can. But is a link really best modeled as a "recoverable condition"? This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures. Please provide a reasoned explanation for your disagreement. Otherwise it is hard to move forward.
          Hide
          Sanjay Radia added a comment -

          Symlink vs link - I am indifferent and can go either way.

          One advantage of Owen's proposal over the current patch: it will help with versioning in the future as each method can evolve its return type
          independently of the return types of other methods.

          Dhruba> Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant.

          Doug> Yes, they could be used here, and it could be fixed later without breaking user code, but that doesn't make it clean and elegant.
          > Rather it is a quick and dirty hack. Returning normal values through exceptions is a perversion.
          > The Namenode should be revised to support links as first-class events, not as exceptions.

          It is highly unlikely to get "fixed" later and over time it will get harder and harder to "fix".
          Doug should we pay the cost and do it the "right way" now?
          (BTW I prefer to use exceptions and don't agree with Doug's view; hence my use of quotes above).

          Show
          Sanjay Radia added a comment - Symlink vs link - I am indifferent and can go either way. One advantage of Owen's proposal over the current patch: it will help with versioning in the future as each method can evolve its return type independently of the return types of other methods. Dhruba> Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant. Doug> Yes, they could be used here, and it could be fixed later without breaking user code, but that doesn't make it clean and elegant. > Rather it is a quick and dirty hack. Returning normal values through exceptions is a perversion. > The Namenode should be revised to support links as first-class events, not as exceptions. It is highly unlikely to get "fixed" later and over time it will get harder and harder to "fix". Doug should we pay the cost and do it the "right way" now? (BTW I prefer to use exceptions and don't agree with Doug's view; hence my use of quotes above).
          Hide
          Sanjay Radia added a comment -

          Dhruba says:
          >If an exception is thrown by the ClientProtocol, it is logged by the RPC subsystem into an error log.
          >This is a good characteristic to have in a distributed system, makes debugging easy because a scan in the error logs
          >pinpoints the exceptions raised by the API.
          >Access control checks or disk-full conditions raise exceptions, and they are logged by the RPC subsystem. We do not want every call to "traverse a
          >symbolic link" to log an exception message in the error logs, do we? (Of course, we can special case it and say that we will not log >UnresolvedPathException; but by
          >special-casing it, we are acknowledging that this exception is not an abnormal behaviour).

          The NN (or any server) should log server error (like disc full) but not user exceptions (like non-existent path name supplied).
          The reason that access violation exceptions are logged is because it is needed for security audits - security is a special case and the info may even be logged in a separate file.
          Hence, in general, the server side needs to filter what is logged.
          If we has some inheritance hierarchy in our exceptions we could make the filter simpler.
          The special casing argument is not valid since we should add some sort of a filter anyway at some point.

          Show
          Sanjay Radia added a comment - Dhruba says: >If an exception is thrown by the ClientProtocol, it is logged by the RPC subsystem into an error log. >This is a good characteristic to have in a distributed system, makes debugging easy because a scan in the error logs >pinpoints the exceptions raised by the API. >Access control checks or disk-full conditions raise exceptions, and they are logged by the RPC subsystem. We do not want every call to "traverse a >symbolic link" to log an exception message in the error logs, do we? (Of course, we can special case it and say that we will not log >UnresolvedPathException; but by >special-casing it, we are acknowledging that this exception is not an abnormal behaviour). The NN (or any server) should log server error (like disc full) but not user exceptions (like non-existent path name supplied). The reason that access violation exceptions are logged is because it is needed for security audits - security is a special case and the info may even be logged in a separate file. Hence, in general, the server side needs to filter what is logged. If we has some inheritance hierarchy in our exceptions we could make the filter simpler. The special casing argument is not valid since we should add some sort of a filter anyway at some point.
          Hide
          Sanjay Radia added a comment -

          Sanjay Saya:
          >> Are you saying that getSymlink() can return file:///etc/dir/file.txt?

          >Yes.

          >Can the name node always know how to compose the two parts. I recall something bizzare about the composing archive filesystem paths.
          >If there is well defined way of composing the two parts then I am fine with the alternative proposed above.
          >If not, the two parts may have to be passed to the file system.

          I checked - the above scheme will work with archives. It uses the standard separator.
          +1 on the getSymlink returning the composed pathname.

          Show
          Sanjay Radia added a comment - Sanjay Saya: >> Are you saying that getSymlink() can return file:///etc/dir/file.txt? >Yes. >Can the name node always know how to compose the two parts. I recall something bizzare about the composing archive filesystem paths. >If there is well defined way of composing the two parts then I am fine with the alternative proposed above. >If not, the two parts may have to be passed to the file system. I checked - the above scheme will work with archives. It uses the standard separator. +1 on the getSymlink returning the composed pathname.
          Hide
          Doug Cutting added a comment -

          Dhruba> the NameNode is not future-proof for backward compatibility.
          Dhruba> Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant.

          Yes, they could be used here, and it could be fixed later without breaking user code, but that doesn't make it clean and elegant. Rather it is a quick and dirty hack. Returning normal values through exceptions is a perversion. The Namenode should be revised to support links as first-class events, not as exceptions.

          Konstantin> This is universal, but not a transparent api anymore.

          It's no less transparent than 'FileStatus getFileStatus(Path)'. In that case we combined the value of many methods (isDirectory(), size(), owner(), etc.) into the value of a single method with a very generic name.

          In this case we wish to collapse three logical calls into one. We want to perform 'isLink()',and then either 'getLink()' or, e.g., 'open()', and return sufficient information so that the caller can tell what happened. In such cases, when multiple values must be grouped together to represent something, we use a data structure, not some other channel like exceptions or global variables. With a data structure comes some loss of transparency, also called abstraction.

          Show
          Doug Cutting added a comment - Dhruba> the NameNode is not future-proof for backward compatibility. Dhruba> Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant. Yes, they could be used here, and it could be fixed later without breaking user code, but that doesn't make it clean and elegant. Rather it is a quick and dirty hack. Returning normal values through exceptions is a perversion. The Namenode should be revised to support links as first-class events, not as exceptions. Konstantin> This is universal, but not a transparent api anymore. It's no less transparent than 'FileStatus getFileStatus(Path)'. In that case we combined the value of many methods (isDirectory(), size(), owner(), etc.) into the value of a single method with a very generic name. In this case we wish to collapse three logical calls into one. We want to perform 'isLink()',and then either 'getLink()' or, e.g., 'open()', and return sufficient information so that the caller can tell what happened. In such cases, when multiple values must be grouped together to represent something, we use a data structure, not some other channel like exceptions or global variables. With a data structure comes some loss of transparency, also called abstraction.
          Hide
          dhruba borthakur added a comment -

          1. "symlink" vs "link":
          I think it makes sense to call the current implementation as "links" instead of symlinks. None of the existing file system implementations have any support for any kinds of links. It is ok for the first implementation to refer to this new construct as a generic "link". HDFS implements it as a symbolic link, but some other file system may implement "links" as hard links.

          +1 for calling this construct as "link" instead of "symlink".

          2. Exceptions vs Objects-as-return-status in an public API (FileSystem or ClientProtocol API)
          Exceptions or Object-as-return-value approaches are two ways of communication a certain piece of information to the user of the API.
          (a) One goal is to discuss how we can attempt to make that API somewhat future proof. If we consider our current Hadoop RPC, the only way to serialize/deserialize an exception object is to serialize its message string. The client side can de-serialize this string and reconstruct an exception object. If the return status need to contain various different pieces of information, then serializing/deser as a string inside the exception object is not very elegant. Many other RPC systems (e.g. Thrift) allow versioning objects (adding new fields) but many might not allow adding new exceptions to a pre-existing method call. Thus, making API calls return objects-as-return-types seem to be more future-proof than adding exceptions.
          (b) The thumb-rule that we have been following is that exceptions are generated when an abnormal situation occurs. If an exception is thrown by the ClientProtocol, it is logged by the RPC subsystem into an error log. This is a good characteristic to have in a distributed system, makes debugging easy because a scan in the error logs
          pinpoints the exceptions raised by the API. Access control checks or disk-full conditions raise exceptions, and they are logged by the RPC subsystem. We do not want every call to "traverse a symbolic link" to log an exception message in the error logs, do we? (Of course, we can special case it and say that we will not log UnresolvedPathException; but by
          special-casing it, we are acknowledging that this exception is not an abnormal behaviour).

          +1 for RenameResult rename(Path src, Path dst) throws IOException;

          3. Exceptions vs Object-as-return-status inside the NameNode
          (a) Different filesystem implementations can have very different implementations and a very different set of developers. For example, HDFS might implement code in such a way that traversing a link returns a object-status where S3 or KFS throws an exception (internal to the implementation). If we write a file system implementation for Ceph, we are likely to not rewrite the Ceph code to not use exceptions (or vice versa). I would like to draw the distinction that this issue is not related to what is decide in case (2) above.
          (b) The primary focus for designing the internal methods of the NameNode is not future-proof for backward compatibility. Also, there isn't any requirement to serialize/deserialize any exception objects as long as that object is used inside the NameNode. Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant.

          +1 for Using Exceptions inside the NameNode internal methods.

          Show
          dhruba borthakur added a comment - 1. "symlink" vs "link": I think it makes sense to call the current implementation as "links" instead of symlinks. None of the existing file system implementations have any support for any kinds of links. It is ok for the first implementation to refer to this new construct as a generic "link". HDFS implements it as a symbolic link, but some other file system may implement "links" as hard links. +1 for calling this construct as "link" instead of "symlink". 2. Exceptions vs Objects-as-return-status in an public API (FileSystem or ClientProtocol API) Exceptions or Object-as-return-value approaches are two ways of communication a certain piece of information to the user of the API. (a) One goal is to discuss how we can attempt to make that API somewhat future proof. If we consider our current Hadoop RPC, the only way to serialize/deserialize an exception object is to serialize its message string. The client side can de-serialize this string and reconstruct an exception object. If the return status need to contain various different pieces of information, then serializing/deser as a string inside the exception object is not very elegant. Many other RPC systems (e.g. Thrift) allow versioning objects (adding new fields) but many might not allow adding new exceptions to a pre-existing method call. Thus, making API calls return objects-as-return-types seem to be more future-proof than adding exceptions. (b) The thumb-rule that we have been following is that exceptions are generated when an abnormal situation occurs. If an exception is thrown by the ClientProtocol, it is logged by the RPC subsystem into an error log. This is a good characteristic to have in a distributed system, makes debugging easy because a scan in the error logs pinpoints the exceptions raised by the API. Access control checks or disk-full conditions raise exceptions, and they are logged by the RPC subsystem. We do not want every call to "traverse a symbolic link" to log an exception message in the error logs, do we? (Of course, we can special case it and say that we will not log UnresolvedPathException; but by special-casing it, we are acknowledging that this exception is not an abnormal behaviour). +1 for RenameResult rename(Path src, Path dst) throws IOException; 3. Exceptions vs Object-as-return-status inside the NameNode (a) Different filesystem implementations can have very different implementations and a very different set of developers. For example, HDFS might implement code in such a way that traversing a link returns a object-status where S3 or KFS throws an exception (internal to the implementation). If we write a file system implementation for Ceph, we are likely to not rewrite the Ceph code to not use exceptions (or vice versa). I would like to draw the distinction that this issue is not related to what is decide in case (2) above. (b) The primary focus for designing the internal methods of the NameNode is not future-proof for backward compatibility. Also, there isn't any requirement to serialize/deserialize any exception objects as long as that object is used inside the NameNode. Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant. +1 for Using Exceptions inside the NameNode internal methods.
          Hide
          Konstantin Shvachko added a comment -

          Owen is proposing a reasonable rename of the FSLink*** classes. But the implementations of these classes remain the same.
          Going further this way we could make the API look like this:

          RenameReply rename(RenameRequest arg) throws IOException;
          

          This is universal, but not a transparent api anymore.
          The next step to completely generalize (and exaggerate) this would be to declare one single protocol method.

          Reply execute(enum opCode, Request arg) throws IOException;
          

          Then the api will have to be defined by a pair of Request, Reply classes.

          Show
          Konstantin Shvachko added a comment - Owen is proposing a reasonable rename of the FSLink*** classes. But the implementations of these classes remain the same. Going further this way we could make the API look like this: RenameReply rename(RenameRequest arg) throws IOException; This is universal, but not a transparent api anymore. The next step to completely generalize (and exaggerate) this would be to declare one single protocol method. Reply execute( enum opCode, Request arg) throws IOException; Then the api will have to be defined by a pair of Request, Reply classes.
          Hide
          Sanjay Radia added a comment -

          Owen says:
          >My preference is toward not using exceptions for nominal cases, and I agree with Doug that links seem like nominal cases. Clearly, however, it isn't a slam dunk.
          >At the fear of crossing streams, I think that the classes like FSLinkBoolean are problematic. I think we are better off creating a type for each method's return type. I think that a lot of these return types should implement a class/interface (Owen ducks) that provides the common interface for dereferencing the links.

          What you loose with owen's class/interface suggestion is that the signature not longer indicates
          that the method returns a boolean or an array of locatedBlocks or an file descriptor etc.
          I think Dhruba was trying to preserve that aspect of the original interfaces where by reading the name of the method and the names and types of the
          input and output parameter, once can easily guess the semantics of the method.
          (the eraser I threw at Owen bounced off the screen).
          But Owen is right in that his approach is a little more future proof (and exceptions are more future proof

          Owen what do you propose for the internal interfaces in the name node? – do you suggest that we remove the symlink-related-exception and use your structure every where in the FS and HDFS code?
          If not, why the inconsistency?

          Show
          Sanjay Radia added a comment - Owen says: >My preference is toward not using exceptions for nominal cases, and I agree with Doug that links seem like nominal cases. Clearly, however, it isn't a slam dunk. >At the fear of crossing streams, I think that the classes like FSLinkBoolean are problematic. I think we are better off creating a type for each method's return type. I think that a lot of these return types should implement a class/interface (Owen ducks) that provides the common interface for dereferencing the links. What you loose with owen's class/interface suggestion is that the signature not longer indicates that the method returns a boolean or an array of locatedBlocks or an file descriptor etc. I think Dhruba was trying to preserve that aspect of the original interfaces where by reading the name of the method and the names and types of the input and output parameter, once can easily guess the semantics of the method. (the eraser I threw at Owen bounced off the screen). But Owen is right in that his approach is a little more future proof (and exceptions are more future proof Owen what do you propose for the internal interfaces in the name node? – do you suggest that we remove the symlink-related-exception and use your structure every where in the FS and HDFS code? If not, why the inconsistency?
          Hide
          Sanjay Radia added a comment -

          Doug says:
          >> [minor] the method next() suggests that one follows the path to the next symbolic link. It does not strictly.

          >Can you elaborate? Each time this is called a link is resolved, returning a path that may contain more links, following a chain of links, no? Also, >'doOperation()' is next to meaningless. So if 'next' is indeed inappropriate we should replace it with something more specific, not something generic.

          First I had qualified this as minor (meaning it is just a minor suggestion for the author).
          When I read the code, I first thought that dhruba was processing all sym links and then doing the actual operation at the end.
          Turns out next() actually performs the operation. If the operation succeeds (and it will for most pathnames including for all dot-relative symlinks)
          all is well and done. Indeed if there is another NN in a chain of links that is willing to process remote symlinks internally all is fine and done.
          If the operations fails, then the recovery is performed.
          Now the design is quite elegant. Later we can add new reasons why the NN could not complete the operation and the structure could be modified
          to recover for new conditions. I like that. The name next() suggests that one is following symlinks so I was suggesting something like doOperaiton() or
          performOperation() which I find more accurate than the the generic "next()".
          Perhaps adding a comment to the abstract method may be sufficient. I am okay with whatever the author decides here - it is a very minor comment.

          Show
          Sanjay Radia added a comment - Doug says: >> [minor] the method next() suggests that one follows the path to the next symbolic link. It does not strictly. >Can you elaborate? Each time this is called a link is resolved, returning a path that may contain more links, following a chain of links, no? Also, >'doOperation()' is next to meaningless. So if 'next' is indeed inappropriate we should replace it with something more specific, not something generic. First I had qualified this as minor (meaning it is just a minor suggestion for the author). When I read the code, I first thought that dhruba was processing all sym links and then doing the actual operation at the end. Turns out next() actually performs the operation. If the operation succeeds (and it will for most pathnames including for all dot-relative symlinks) all is well and done. Indeed if there is another NN in a chain of links that is willing to process remote symlinks internally all is fine and done. If the operations fails, then the recovery is performed. Now the design is quite elegant. Later we can add new reasons why the NN could not complete the operation and the structure could be modified to recover for new conditions. I like that. The name next() suggests that one is following symlinks so I was suggesting something like doOperaiton() or performOperation() which I find more accurate than the the generic "next()". Perhaps adding a comment to the abstract method may be sufficient. I am okay with whatever the author decides here - it is a very minor comment .
          Hide
          Raghu Angadi added a comment -

          Doug wrote :
          > Exceptions are intended for cases where you're not sure who will catch them. [...]

          UnresolvedLinkException seems like such a case, when it is thrown, we don't know whether it will be handled inside NameNode, or DFS or FS on the namenode.. it might be same with other FS implementations too.

          Show
          Raghu Angadi added a comment - Doug wrote : > Exceptions are intended for cases where you're not sure who will catch them. [...] UnresolvedLinkException seems like such a case, when it is thrown, we don't know whether it will be handled inside NameNode, or DFS or FS on the namenode.. it might be same with other FS implementations too.
          Hide
          Owen O'Malley added a comment - - edited

          My preference is toward not using exceptions for nominal cases, and I agree with Doug that links seem like nominal cases. Clearly, however, it isn't a slam dunk.

          At the fear of crossing streams, I think that the classes like FSLinkBoolean are problematic. I think we are better off creating a type for each method's return type. I think that a lot of these return types should implement a class/interface (Owen ducks) that provides the common interface for dereferencing the links.

          The reason for having

          RenameResult rename(Path src, Path dst) throws IOException;
          versus
          FSLinkBoolean rename(Path src, Path dst) throws IOException;
          

          would be that if we want our protocol to be future-proof, rename may well need to add additional return values. If we have a bunch of methods all returning the same type, that is hard.

          Thoughts?

          Show
          Owen O'Malley added a comment - - edited My preference is toward not using exceptions for nominal cases, and I agree with Doug that links seem like nominal cases. Clearly, however, it isn't a slam dunk. At the fear of crossing streams, I think that the classes like FSLinkBoolean are problematic. I think we are better off creating a type for each method's return type. I think that a lot of these return types should implement a class/interface ( Owen ducks ) that provides the common interface for dereferencing the links. The reason for having RenameResult rename(Path src, Path dst) throws IOException; versus FSLinkBoolean rename(Path src, Path dst) throws IOException; would be that if we want our protocol to be future-proof, rename may well need to add additional return values. If we have a bunch of methods all returning the same type, that is hard. Thoughts?
          Hide
          Sanjay Radia added a comment -

          Dhruba says:
          > I vote for using exceptions internal to the NameNode to keep internal interfaces clean and elegant.
          I agree with you that using exception keeps the NN interfaces clean and elegant.
          I further agree with Raghu that we need to be consistent here and extend your taste for "clean and elegant" to to the RPC interface.

          Show
          Sanjay Radia added a comment - Dhruba says: > I vote for using exceptions internal to the NameNode to keep internal interfaces clean and elegant. I agree with you that using exception keeps the NN interfaces clean and elegant. I further agree with Raghu that we need to be consistent here and extend your taste for "clean and elegant" to to the RPC interface.
          Hide
          Sanjay Radia added a comment -

          >> Are you saying that getSymlink() can return file:///etc/dir/file.txt?

          >Yes.

          Can the name node always know how to compose the two parts. I recall something bizzare about the composing archive filesystem paths.
          If there is well defined way of composing the two parts then I am fine with the alternative proposed above.
          If not, the two parts may have to be passed to the file system.

          Show
          Sanjay Radia added a comment - >> Are you saying that getSymlink() can return file:///etc/dir/file.txt? >Yes. Can the name node always know how to compose the two parts. I recall something bizzare about the composing archive filesystem paths. If there is well defined way of composing the two parts then I am fine with the alternative proposed above. If not, the two parts may have to be passed to the file system.
          Hide
          Konstantin Shvachko added a comment -

          > public class FsReturnType<T>

          Interesting idea. You will rather have something like

          public class FSReturnType<T extends Writable> implements Writable {
            public T getValue();
            Path getLink() throws IOException;
          }
          

          And implement serialization methods, and think how to define WritableFactory for the parametrized class.
          And then the protocol methods will look like this

          public FSReturnType<BooleanWritable> setReplication(String src,...); 
          

          This is better than creating a new class for every new return type, but seems rather complex compared to one new protocol method plus the UnresolvedPathException that I was proposing.

          Show
          Konstantin Shvachko added a comment - > public class FsReturnType<T> Interesting idea. You will rather have something like public class FSReturnType<T extends Writable> implements Writable { public T getValue(); Path getLink() throws IOException; } And implement serialization methods, and think how to define WritableFactory for the parametrized class. And then the protocol methods will look like this public FSReturnType<BooleanWritable> setReplication( String src,...); This is better than creating a new class for every new return type, but seems rather complex compared to one new protocol method plus the UnresolvedPathException that I was proposing.
          Hide
          Doug Cutting added a comment -

          > BTW Do you find such use of exceptions to be acceptable inside the name node?

          No.

          I thought I made it clear that I had not reviewed the namenode code changes, but only those in the public APIs.

          > No one is suggesting that we handle it as a single call

          Linux's VFS does not handle both open and isLink in a single call, the way we'd like to.

          > I am interpreting your reference URL link differently - I am reading it and seeing that using exceptions for recovery is acceptable.

          That's a separate issue. Recovery from an unexpected condition out of the control of the code in question. That's not the case here. Links are an expected, normal condition.

          Show
          Doug Cutting added a comment - > BTW Do you find such use of exceptions to be acceptable inside the name node? No. I thought I made it clear that I had not reviewed the namenode code changes, but only those in the public APIs. > No one is suggesting that we handle it as a single call Linux's VFS does not handle both open and isLink in a single call, the way we'd like to. > I am interpreting your reference URL link differently - I am reading it and seeing that using exceptions for recovery is acceptable. That's a separate issue. Recovery from an unexpected condition out of the control of the code in question. That's not the case here. Links are an expected, normal condition.
          Hide
          Sanjay Radia added a comment -

          Dhruba, a question about your patch. Why do the data types LocatedBlocks, FSDataInputStream, FSDataOutputStream
          contain the methods isSymlink() and getSymlink()?
          Or is it for overloading the return type to process the remote symbolic links?
          To be consistent with the rest of your patch, wouldn't it have been better to declare the corresponding FSLinkXXX types for each of XXX above.

          Show
          Sanjay Radia added a comment - Dhruba, a question about your patch. Why do the data types LocatedBlocks, FSDataInputStream, FSDataOutputStream contain the methods isSymlink() and getSymlink()? Or is it for overloading the return type to process the remote symbolic links? To be consistent with the rest of your patch, wouldn't it have been better to declare the corresponding FSLinkXXX types for each of XXX above.
          Hide
          Sanjay Radia added a comment -

          > Let's not call something 'unclean' just because it is more work.

          It has nothing to do with more work – I am always willing to do more work for the right clean solution.
          Konstantine also said it is "bizzare" that all the methods return FSLink. I am not the only one finding this "unclean" or "bizzare".

          There is a the fundamental difference of opinion here: severl of us are arguing that the use of exceptions is fine for recoverable conditions. It keeps the interface simple and clean.

          BTW Do you find such use of exceptions to be acceptable inside the name node? Both Raghu and I raised this.
          Are you suggesting that the namenode side of the code be changed to not use exceptions?
          If so, you are consistent and it will be some work to change the internal name node code to use FSLink style approach; dhruba's patch does not do this.
          If not, why the inconsistency in when exceptions are appropriate or not appropriate?

          >The interfaces should reflect their function.
          Having create() return FSLink does not help understand its function.
          Return types like "FSLinkBoolean" and FSLinkBlockLocations confuse the function of the methods in an interface.
          The symbolic link issue cuts across all interfaces that have path names. Overloading the return type of each method that has path name parameters
          dilutes how the well the method signature reflects it function.
          The method signatures and datatypes are easier to understand if we use exceptions (which in the opinion of some of us is acceptable for recoverable conditions).

          > Other FSs (e.g., linux VFS) don't try to do this in a single call, but rather loop, checking whether each element in the path is a link.
          No one is suggesting that we handle it as a single call – with exceptions it will loop in exactly the same fashion. Perhaps you have misunderstood the
          alternate solution being proposed.

          >We can try to hack around that by using exceptions or thread locals or output parameters,
          > but those are all just attempts to try to make this API look
          >like other APIs rather than embrace its differences.
          (Q. By overloading the return type aren't you are using output parameters? The return type is an output parameter.)
          Many will consider the overloading of return types also hacky (and this will get worse when we have other reasons in the future when client side needs to perform a recovery function.). Clearly we have a difference in opinion on what is hacky or not.

          Comparing to other file systems APIs is not fair here for two reasons:
          1) they do not use a programming language that has first class support for exceptions (both checked and unchecked).
          2) Mounts and symbolic links in most systems are handled on the server side - in most FSs, clients cannot process links.
          The only possible exception (no pun intended) is Slash-relative symbolic links in NFS (I don't know if NFS supports slash-relative symbolic links that are kicked back to be relative to the client's view of Slash (root)).

          In the two approaches being discussed in this Jira, both are looping to process the operation across links/mount points. The difference is how the
          condition is communicated back to caller and the impact it has on the method signatures and data types used as parameters or return types.
          There is a clear difference of opinion here.
          I am interpreting your reference URL link differently - I am reading it and seeing that using exceptions for recovery is acceptable. Further following a mount point link is not normal and hence the use of exception is acceptable. Joshau's Effective Java also, as per my reading, advocates the use of checked exceptions for recoverable conditions. You will probably interpret that text differently or disagree with it.

          Show
          Sanjay Radia added a comment - > Let's not call something 'unclean' just because it is more work. It has nothing to do with more work – I am always willing to do more work for the right clean solution. Konstantine also said it is "bizzare" that all the methods return FSLink. I am not the only one finding this "unclean" or "bizzare". There is a the fundamental difference of opinion here: severl of us are arguing that the use of exceptions is fine for recoverable conditions. It keeps the interface simple and clean. BTW Do you find such use of exceptions to be acceptable inside the name node? Both Raghu and I raised this. Are you suggesting that the namenode side of the code be changed to not use exceptions? If so, you are consistent and it will be some work to change the internal name node code to use FSLink style approach; dhruba's patch does not do this. If not, why the inconsistency in when exceptions are appropriate or not appropriate? >The interfaces should reflect their function. Having create() return FSLink does not help understand its function. Return types like "FSLinkBoolean" and FSLinkBlockLocations confuse the function of the methods in an interface. The symbolic link issue cuts across all interfaces that have path names. Overloading the return type of each method that has path name parameters dilutes how the well the method signature reflects it function. The method signatures and datatypes are easier to understand if we use exceptions (which in the opinion of some of us is acceptable for recoverable conditions). > Other FSs (e.g., linux VFS) don't try to do this in a single call, but rather loop, checking whether each element in the path is a link. No one is suggesting that we handle it as a single call – with exceptions it will loop in exactly the same fashion. Perhaps you have misunderstood the alternate solution being proposed. >We can try to hack around that by using exceptions or thread locals or output parameters, > but those are all just attempts to try to make this API look >like other APIs rather than embrace its differences. (Q. By overloading the return type aren't you are using output parameters? The return type is an output parameter.) Many will consider the overloading of return types also hacky (and this will get worse when we have other reasons in the future when client side needs to perform a recovery function.). Clearly we have a difference in opinion on what is hacky or not. Comparing to other file systems APIs is not fair here for two reasons: 1) they do not use a programming language that has first class support for exceptions (both checked and unchecked). 2) Mounts and symbolic links in most systems are handled on the server side - in most FSs, clients cannot process links. The only possible exception (no pun intended) is Slash-relative symbolic links in NFS (I don't know if NFS supports slash-relative symbolic links that are kicked back to be relative to the client's view of Slash (root)). In the two approaches being discussed in this Jira, both are looping to process the operation across links/mount points. The difference is how the condition is communicated back to caller and the impact it has on the method signatures and data types used as parameters or return types. There is a clear difference of opinion here. I am interpreting your reference URL link differently - I am reading it and seeing that using exceptions for recovery is acceptable. Further following a mount point link is not normal and hence the use of exception is acceptable. Joshau's Effective Java also, as per my reading, advocates the use of checked exceptions for recoverable conditions. You will probably interpret that text differently or disagree with it.
          Hide
          Tsz Wo Nicholas Sze added a comment - - edited

          Good question. I prefer FsXxx, but we've not been consistent in Hadoop, nor is Java. I think java.net.URL was a mistake, that it should have been named java.net.Url, that acronyms in class names should be capitalized, not all caps. This makes them easier to parse.

          It is the time to rename all the FSXxx classes since the pubic API is going to be changed a lot in 0.20 anyway. I have created HADOOP-4357 for renaming the classes.

          Show
          Tsz Wo Nicholas Sze added a comment - - edited Good question. I prefer FsXxx, but we've not been consistent in Hadoop, nor is Java. I think java.net.URL was a mistake, that it should have been named java.net.Url, that acronyms in class names should be capitalized, not all caps. This makes them easier to parse. It is the time to rename all the FSXxx classes since the pubic API is going to be changed a lot in 0.20 anyway. I have created HADOOP-4357 for renaming the classes.
          Hide
          Doug Cutting added a comment -

          > public class FsReturnType<T>

          This looks good to me.

          > BTW, should our naming convention be FsXxx or FSXxx?

          Good question. I prefer FsXxx, but we've not been consistent in Hadoop, nor is Java. I think java.net.URL was a mistake, that it should have been named java.net.Url, that acronyms in class names should be capitalized, not all caps. This makes them easier to parse.

          Show
          Doug Cutting added a comment - > public class FsReturnType<T> This looks good to me. > BTW, should our naming convention be FsXxx or FSXxx? Good question. I prefer FsXxx, but we've not been consistent in Hadoop, nor is Java. I think java.net.URL was a mistake, that it should have been named java.net.Url, that acronyms in class names should be capitalized, not all caps. This makes them easier to parse.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I think it is wrong to combine FSLink and the original return type like boolean, BlockLocations, etc. to create new classes. The problem here is that we need to return two items in a single method call. The usually approach is to return a container, say FsReturnType<T>. I suggest the following API:

          //T is the original type
          public class FsReturnType<T> {
            public T getValue() {
              ...
            }
          
            Path getLink() throws IOException {
              ...
            }
          }
          

          BTW, should our naming convention be FsXxx or FSXxx?

          Show
          Tsz Wo Nicholas Sze added a comment - I think it is wrong to combine FSLink and the original return type like boolean, BlockLocations, etc. to create new classes. The problem here is that we need to return two items in a single method call. The usually approach is to return a container, say FsReturnType<T>. I suggest the following API: //T is the original type public class FsReturnType<T> { public T getValue() { ... } Path getLink() throws IOException { ... } } BTW, should our naming convention be FsXxx or FSXxx?