Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Add new file system interface AbstractFileSystem with implementation of some file systems that delegate to old FileSystem.

      Description

      The FileContext API (HADOOP-4952) provides an improved interface for the application writer.
      This lets us simplify the FileSystem API since it will no longer need to deal with notions of default filesystem [ / ], wd, and config
      defaults for blocksize, replication factor etc. Further it will not need the many overloaded methods for create() and open() since
      the FileContext API provides that convenience.
      The FileSystem API can be simplified and can now be restricted to those implementing new file systems.

      This jira proposes that we create new file system API, and deprecate FileSystem API after a few releases.

      1. AbstractFileSystem.java
        18 kB
        Sanjay Radia
      2. afs1.patch
        61 kB
        Sanjay Radia
      3. AFS10.patch
        109 kB
        Sanjay Radia
      4. AFS11.patch
        109 kB
        Sanjay Radia
      5. AFS15.patch
        100 kB
        Sanjay Radia
      6. AFS16.patch
        109 kB
        Suresh Srinivas
      7. AFS17.patch
        113 kB
        Sanjay Radia
      8. AFS18.patch
        113 kB
        Sanjay Radia
      9. AFS4.patch
        94 kB
        Sanjay Radia
      10. Hdfs.java
        5 kB
        Sanjay Radia
      11. Hdfs.java
        4 kB
        Sanjay Radia

        Issue Links

          Activity

          Sanjay Radia created issue -
          Sanjay Radia made changes -
          Field Original Value New Value
          Assignee Sanjay Radia [ sanjay.radia ]
          Doug Cutting made changes -
          Comment [ I looked through FileContext9.patch in detail last night. My notes are below. Obviously this is still a preliminary patch, but I wasn't sure I'd have time to look again for a few weeks, so figured I'd give a detailed review now - feel free to ignore notes about things that are preliminary bits and won't be in the final patch.

          FileContext.java;
            nits:
              - apache license headers

              - typos:
                FileContext.java:22 operationS
                "imples" -> implies
                "imples a default" (extra space)

              - check for more than 80 columns (eg FileContext.java:33, 39)

              - "file related" -> "file-related" (line 56)

              - "all file systems instance" -> "all file system instances"

              - overall "filesystem" seems better to me than "file system"

              - inconsistent use of "you" vs "one" becomes confusing to read

              - Extraneous bit of javadoc left right before FileContext class definition - looks like notes to self.

              - indentation on initFromConfig method is too deep

              - javadoc for getFSofPath is not filled in

              - extra space before "theConfig" on line 195

              - extra ';' on line 209

              - new member variables defined in the middle of the class (LOCAL_FS_URI and defaultPerm)

              - typo: "relavent" line 267

              - javadoc for setWorkingDirectory refers to "newWdir" but the param is called "p". The description should be under the @param p section.

              - indentation on line 513

              - Javadoc for setTimes: I'd rather see "since the epoch" instead of "since Jan 1, 1970".

            code:

              - "conf" is the common name rather than "theConfig"

              - rather than FileSystem.getInitialWorkingDirectory() returning null by default and having the check in FileContext, have it default to just passing through to getHomeDirectory() in FileSystem.java

              - is defaultFsURI.equals(defaultFS.getUri()) an invariant? If so, why are there two separate member variables? If not, when is that not the case?

              - could makeAbsolute be made public? It seems generally useful

              - style: "isNotSchemeWithRelative" should be called "checkNotSchemeWithRelative" - otherwise it looks like it should return a boolean

              - also, that function claims to throw IOException if it's of the wrong type, but in fact it throws IllegalArgumentException.

              - are these kinds of paths *ever* legal in Hadoop? If not, can this check go into the Path constructor such that we can never end up with an invalid object?

              - getFSofPath seems like a hackish implementation, but unfortunately that's the API that FileSystem gives. Could that function in FileSystem be refactored a bit so there is a more sensical way of checking ownership of a path without having to throw/catch an exception like this?

              - JavaDoc for FileContext() constructor - again this is the normal behavior for hadoop configuration. I think you can leave it at just "using a default Configuration". In fact, I am sort of in favor of removing this constructor entirely and forcing the user to explicitly choose to construct a new Configuration().

              - the static factory methods (getFileContext, getLocalFSFileContext) seem confusing in that they are not singleton instances. I can see this tripping people up - if you think they should be new instances every time it might be worth renaming the methods to "createFileContext" or "newFileContext".

              - I'm not sure I see the point of the factory methods that call setDefaultFileSystem for you - again the "get" name makes it seem like it would be the same instance every time if you call it multiple times on the same FS.

              - Why allow the user to pass either URI or FileSystem instances? There's less code if you just provide one, and the user can always go from one to the other. I'm in favor of fewer code paths where possible.

              - setWorkingDirectory - why not call makeAbsolute here?

              - CreatOpts should be CreateOptions. I think Ken Thompson once remarked that if he had to do anything differently in his past he would have spelled O_CREAT right :)

              - ReflactionFac -> ReplicationFactor

              - I'm not sold on the varargs/CreatOpts solution. What about having a class using boxed Integers, where "null" means "not set by the user". Then use the method chaining pattern so you can do new CreateOptions().setDefaultReplication(3).setBlockSize(64*1024*1024).etc.etc.etc. Given this structure we could refactor the general FileSystem interface to take that single parameter, rather than the hard-to-migrate lots-of-arguments-to-open route we've got now.

              - If left that way, at least change the sequence of "ifs" to "else if's, and add an else clause at the end with "throw new IllegalArgumentException(...)"

              - mkdirs: this javadoc claims that it returns false if the directory already exists, but in my experience that isn't true, at least with HDFS. I vaguely recall you opened another JIRA about that elsewhere, so maybe out-of-scope for this one.

              - mkdirs: either don't accept null as an argument, or document that null means defaults in the javadoc

              - style: abs_dir -> absDir

              - delete: another maybe out-of-scope thing: I believe delete with recursive=false will still fail even on an empty directory. Would be nice to clear that up before 0.21.

              - rename: would be nice to clarify these semantics as well. Again I think I saw another JIRA about this.

              - rename: rather than comparing URIs for equality, would it be possible to implement FileSystem.equals to default to using URIs, but allow other classes to override?

              - setOwner: some kind of assertion that (username != null || groupname != null)?

              - setVerifyChecksum(bool) - since this isn't Path-specific, it seems odd that it's present here. The pattern imho should be anything that takes a Path should be present in FileContext and forward through to the appropriate FS, but non-Path-specific stuff shouldn't be delegated.

              - the "Not implemented yet" method should just be removed, imo

              - getFSStatus might be better named getFsStatus to match the type. Also I think better to call the no-param getStatus() rather than getStatus(null), stylewise

              - getFSStatus() doesn't makeAbsolute the path

              - Why does the inner class have a redundant reference thisFC? Since it's not a static inner class this reference already exists implicitly. The purpose of this inner class also isn't clear to me.

              - Agree with the "TBD not very clear" on line 1263 - this code path is really hard to follow.

              - Line 1263: should use srcFS.equals(dstFS) here

          overall:


            - I'm not 100% sold that your working directory can be on a different filesystem than the default filesystem. In particular, I think it's unexpected that "foo" and "/foo" might be on different filesystems. Could this be changed such that there's a concept of a "home directory" and "working directory", and you can always "cd home". Then absolute paths and relative paths would always be on the same filesystem as the working directory.

            - I'm unclear of the definition of "Server Side Defaults" from the JavaDoc - are they in fact server side? They're just configurations, right? In general the discussion of Configuration in FileContext seems somewhat extraneous - the behavior here is the same as it is everywhere else in Hadoop, until HDFS-578 is done. If HDFS-578 is a true pre-requisite to this, can you link the tickets?

            - the isNotSchemeWithRelative function is only called in a few places - it's unclear to me whether this is on purpose or not. Maybe this should just be part of makeAbsolute?

            - There's already some confusion/inclarity in the Hadoop docs about what "qualified" vs "absolute" means in terms of paths. If the JavaDocs here could help clarify that, I think that would be a big plus. (what's the difference between makeQualified and makeAbsolute?)

            - The "TBD"s should be changed to "TODOs" so that they flag on the Hudson test-patch. And, of course, they should be fixed before this gets committed.

            - A lot of the stuff in util is copied straight from FileSystem.java. This code duplication should be avoided. This also includes DEFAULT_FILTER, GlobFilter below



          FsConfig.java:

          nits:
            - typo: line 10: static method*s*
            - missing apache license
            - "see the note on the javdoc" -> jav*a*doc

          code:
            - the "NOTE" section of the class javadoc doesn't make sense to me. What Files class?

            - What is SERVER_DEFAULT(-1) referring to? I see neither the constant nor a literal -1 anywhere. [edit: looks like this is part of HDFS-578, see above]

            - FS_HOME_DIR default isn't quite correct.. it would be /user/<foo> but I don't think you can do a ${...} style substitution from this context. So the calculation of the default would have to be done in code, or else just not provide a default at all and throw an exception if for some reason it can't be grabbed from the Configuration object.

            - Would be nice to have javadocs for each of the static getters

            - getImplClass: check that the URI has a scheme. If it has no scheme, throw an IllegalArgumentException? Add javadoc to that effect.

            - The class javadoc indicates that there can be instances of this class, but it is entirely made of static methods. Remove that section of the javadoc and make the class abstract and final.


          FileSystem.java:
            - line 544 typo agains -> against
            - typo: absolutPermission -> absolutePermission (perhaps this is a new brand of Hadoop sponsored vodka! +1!)


          FileContextBaseTest.java:

            - line 102: should duplicate the assertion from line 103 before setting the working directory here too - make sure we ended up at the right absolute path by means of the relative "cd"s

            - should test setWorkingDirectory on a nonexistent dir

            - also, all of the setWorkingDirectory calls are called with already-qualified paths. Should try some with non-qualified paths too.

            - test on line 117 confirms my earlier comment that mkdirs returns true if you call it on an already-existing directory
          ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue relates to HADOOP-4357 [ HADOOP-4357 ]
          Sanjay Radia made changes -
          Link This issue is related to HADOOP-4952 [ HADOOP-4952 ]
          Sanjay Radia made changes -
          Link This issue relates to HADOOP-6265 [ HADOOP-6265 ]
          Sanjay Radia made changes -
          Attachment AbstractFileSystem.java [ 12420565 ]
          Attachment Hdfs.java [ 12420566 ]
          Sanjay Radia made changes -
          Attachment afs1.patch [ 12421030 ]
          Attachment Hdfs.java [ 12421031 ]
          Sanjay Radia made changes -
          Attachment AFS4.patch [ 12421768 ]
          Sanjay Radia made changes -
          Attachment AFS8.patch [ 12421848 ]
          Sanjay Radia made changes -
          Attachment AFS8.patch [ 12421848 ]
          Sanjay Radia made changes -
          Attachment AFS9.patch [ 12421856 ]
          Sanjay Radia made changes -
          Attachment AFS9.patch [ 12421856 ]
          Sanjay Radia made changes -
          Attachment AFS10.patch [ 12421858 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Sanjay Radia made changes -
          Attachment AFS11.patch [ 12421928 ]
          Sanjay Radia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Sanjay Radia made changes -
          Link This issue blocks HDFS-702 [ HDFS-702 ]
          Sanjay Radia made changes -
          Attachment AFS15.patch [ 12422261 ]
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Suresh Srinivas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Suresh Srinivas made changes -
          Attachment AFS16.patch [ 12422657 ]
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Suresh Srinivas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Aaron Kimball made changes -
          Link This issue is related to HADOOP-6324 [ HADOOP-6324 ]
          Sanjay Radia made changes -
          Attachment AFS17.patch [ 12423587 ]
          Sanjay Radia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Sanjay Radia made changes -
          Attachment AFS18.patch [ 12423627 ]
          Sanjay Radia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Suresh Srinivas made changes -
          Hadoop Flags [Reviewed]
          Release Note Add new file system interface AbstractFileSystem with implementation of some file systems that delegate to old FileSystem.
          Fix Version/s 0.22.0 [ 12314296 ]
          Component/s fs [ 12310689 ]
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Tom White made changes -
          Fix Version/s 0.21.0 [ 12313563 ]
          Fix Version/s 0.22.0 [ 12314296 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Link This issue blocks HDFS-702 [ HDFS-702 ]
          Gavin made changes -
          Link This issue is depended upon by HDFS-702 [ HDFS-702 ]

            People

            • Assignee:
              Sanjay Radia
              Reporter:
              Sanjay Radia
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development