Hadoop Common
  1. Hadoop Common
  2. HADOOP-4952

Improved files system interface for the application writer.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      New FileContext API introduced to replace FileSystem API. FileContext will be the version-compatible API for future releases. FileSystem API will be deprecated in the next release.

      Description

      Currently the FIleSystem interface serves two purposes:

      • an application writer's interface for using the Hadoop file system
      • a file system implementer's interface (e.g. hdfs, local file system, kfs, etc)

      This Jira proposes that we provide a simpler interfaces for the application writer and leave the FilsSystem interface for the implementer of a filesystem.

      • Filesystem interface has a confusing set of methods for the application writer
      • We could make it easier to take advantage of the URI file naming
        • Current approach is to get FileSystem instance by supplying the URI and then access that name space. It is consistent for the FileSystem instance to not accept URIs for other schemes, but we can do better.
        • The special copyFromLocalFIle can be generalized as a copyFile where the src or target can be generalized to any URI, including the local one.
        • The proposed scheme (below) simplifies this.
      • The client side config can be simplified.
        • New config() by default uses the default config. Since this is the common usage pattern, one should not need to always pass the config as a parameter when accessing the file system.
        • It does not handle multiple file systems too well. Today a site.xml is derived from a single Hadoop cluster. This does not make sense for multiple Hadoop clusters which may have different defaults.
        • Further one should need very little to configure the client side:
          • Default files system.
          • Block size
          • Replication factor
          • Scheme to class mapping
        • It should be possible to take Blocksize and replication factors defaults from the target file system, rather then the client size config. I am not suggesting we don't allow setting client side defaults, but most clients do not care and would find it simpler to take the defaults for their systems from the target file system.
      1. FileContext-common25.patch
        92 kB
        Sanjay Radia
      2. FileContext-common24.patch
        92 kB
        Sanjay Radia
      3. FileContext-common22.patch
        91 kB
        Sanjay Radia
      4. FileContext-common21.patch
        92 kB
        Sanjay Radia
      5. FileContext-common19.patch
        90 kB
        Sanjay Radia
      6. FileContext-common18.patch
        88 kB
        Sanjay Radia
      7. FileContext-common16.patch
        88 kB
        Sanjay Radia
      8. FileContext-common14.patch
        88 kB
        Sanjay Radia
      9. FileContext-common13.patch
        88 kB
        Sanjay Radia
      10. FileContext-common12.patch
        88 kB
        Sanjay Radia
      11. FileContext-hdfs11.patch
        8 kB
        Sanjay Radia
      12. FileContext-common11.patch
        83 kB
        Sanjay Radia
      13. FileContext-hdfs10.patch
        5 kB
        Sanjay Radia
      14. FileContext-common10.patch
        81 kB
        Sanjay Radia
      15. FileContext9.patch
        79 kB
        Sanjay Radia
      16. FileContext7.patch
        73 kB
        Sanjay Radia
      17. FileContext6.patch
        70 kB
        Sanjay Radia
      18. FileContext5.patch
        68 kB
        Sanjay Radia
      19. FileContext3.patch
        65 kB
        Sanjay Radia
      20. FilesContext2.patch
        66 kB
        Sanjay Radia
      21. FilesContext1.patch
        53 kB
        Sanjay Radia
      22. Files.java
        15 kB
        Sanjay Radia
      23. Files.java
        11 kB
        Sanjay Radia

        Issue Links

          Activity

          Sanjay Radia created issue -
          Hide
          Sanjay Radia added a comment -

          The attached Files.java is a class for the proposed new interface for the application writer.
          The FileSystem class will remain for the fs implementor (after some cleanup); it will also be used to implement the
          Files class.
          None of the methods in the Files class have an implementation yet - the skeleton is to discuss the proposal.

          Show
          Sanjay Radia added a comment - The attached Files.java is a class for the proposed new interface for the application writer. The FileSystem class will remain for the fs implementor (after some cleanup); it will also be used to implement the Files class. None of the methods in the Files class have an implementation yet - the skeleton is to discuss the proposal.
          Sanjay Radia made changes -
          Field Original Value New Value
          Attachment Files.java [ 12396861 ]
          Hide
          Tom White added a comment -

          Wouldn't it make more sense to keep the FileSystem interface for clients and evolve it, and move the implementor's interface to an SPI?

          Also, instead of creating a new interface, we might be able to use the new FileSystem interface being developed in JSR 203. See HADOOP-3518.

          Show
          Tom White added a comment - Wouldn't it make more sense to keep the FileSystem interface for clients and evolve it, and move the implementor's interface to an SPI? Also, instead of creating a new interface, we might be able to use the new FileSystem interface being developed in JSR 203. See HADOOP-3518 .
          Hide
          Doug Cutting added a comment -

          It might be nice to make it simpler to open a file given a URI and assuming the default configuration, but there are hazards to that as well. Application code often becomes library code, and changing code that does not explicitly pass a configuration to start passing a configuration breaks compatibility. The Hadoop API convention is everything must be explicitly passed a configuration, so that we do not rely on static state. This is just a cost of doing business, and it's dangerous to remove it.

          So I agree that static utility methods would be nice, since the FileSystem implementation can be automatically inferred from the configuration and the URI, but I think the Configuration should still be explicitly passed to these methods. If we don't think that applications should ever need to call the non-static FileSystem methods, then placing the static methods in a different class makes sense, since it consolidates the documentation that users need be aware of. Does that make sense to you, Tom?

          Also, create() has too many parameters and is fragile. We should add a FileCreation class that encapsulates these. We should consider adding such classes for any methods that take more than a Configuration and a Path.

          Show
          Doug Cutting added a comment - It might be nice to make it simpler to open a file given a URI and assuming the default configuration, but there are hazards to that as well. Application code often becomes library code, and changing code that does not explicitly pass a configuration to start passing a configuration breaks compatibility. The Hadoop API convention is everything must be explicitly passed a configuration, so that we do not rely on static state. This is just a cost of doing business, and it's dangerous to remove it. So I agree that static utility methods would be nice, since the FileSystem implementation can be automatically inferred from the configuration and the URI, but I think the Configuration should still be explicitly passed to these methods. If we don't think that applications should ever need to call the non-static FileSystem methods, then placing the static methods in a different class makes sense, since it consolidates the documentation that users need be aware of. Does that make sense to you, Tom? Also, create() has too many parameters and is fragile. We should add a FileCreation class that encapsulates these. We should consider adding such classes for any methods that take more than a Configuration and a Path.
          Hide
          Sanjay Radia added a comment -

          Doug do you mean you would prefer something like:
          Files.create(config, path, ...)
          File.open(config, path, ...)

          The exportConfig and importConfig methods in the proposed patch (Owen's suggestion) are for the library writer, but the library writer has to be aware of it.

          Show
          Sanjay Radia added a comment - Doug do you mean you would prefer something like: Files.create(config, path, ...) File.open(config, path, ...) The exportConfig and importConfig methods in the proposed patch (Owen's suggestion) are for the library writer, but the library writer has to be aware of it.
          Hide
          Sanjay Radia added a comment -

          > Also, create() has too many parameters and is fragile.

          The proposed Files interface significantly reduces the number of create methods. It accomplishes this partly by
          allowing one to pass in a -1 to certain parameters (blk size, replication factor, buf size) to use their default values

          Thus the first create method is very simple and robust for type checking:

            public static FSDataOutputStream create(Path f,
                                              FsPermission permission,
                                              boolean overwrite) throws IOException 
          

          The second create methods is fragile since many of the parameters are ints and one can easily mix the parameters.

          Given that most app writers will use the above simpler method, I am not as worried as I used to be.

          Were you thinking of something like:

          // Style 1
          createdFile = create(path,  permission,  overwrite)
          createFile.setBlockSize(512); // optional most are okay with defaults
           outputStream = createFile.open();
          
          // Style 2 
           outputStream = create(path,  permission,  overwrite); // the common case where the defaults are fine
          
          createParms.setBlockSize(512);                       // for when the defaults are not fine.
          outputStream = create(path,  permission,  overwrite, createParms);
          
          // For comparison, the proposed patch would have required the following 
          outputStream = create(path, permissions, override); // common case when defaults are fine
          outputStream = create(path, permissions, override, 512, -1, -1, progress); // when you want to override the defaults
          
          
          Show
          Sanjay Radia added a comment - > Also, create() has too many parameters and is fragile. The proposed Files interface significantly reduces the number of create methods. It accomplishes this partly by allowing one to pass in a -1 to certain parameters (blk size, replication factor, buf size) to use their default values Thus the first create method is very simple and robust for type checking: public static FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite) throws IOException The second create methods is fragile since many of the parameters are ints and one can easily mix the parameters. Given that most app writers will use the above simpler method, I am not as worried as I used to be. Were you thinking of something like: // Style 1 createdFile = create(path, permission, overwrite) createFile.setBlockSize(512); // optional most are okay with defaults outputStream = createFile.open(); // Style 2 outputStream = create(path, permission, overwrite); // the common case where the defaults are fine createParms.setBlockSize(512); // for when the defaults are not fine. outputStream = create(path, permission, overwrite, createParms); // For comparison, the proposed patch would have required the following outputStream = create(path, permissions, override); // common case when defaults are fine outputStream = create(path, permissions, override, 512, -1, -1, progress); // when you want to override the defaults
          Hide
          Doug Cutting added a comment -

          > Doug do you mean you would prefer something like: Files.create(config, path, ...)

          Yes. We require a configuration in nearly every other API so I don't think eliminating one here actually saves much and it makes it likely that folks will write code that mistakenly assumes the default configuration.

          As for create options we might try something like:

          outputStream = File.create(config, path, new FsCreate().permission(p).blocksize(b));

          We might also provide a version like:

          outputStream = File.create(config, path);

          that is equivalent to:

          outputStream = File.create(config, path, new FsCreate());

          Show
          Doug Cutting added a comment - > Doug do you mean you would prefer something like: Files.create(config, path, ...) Yes. We require a configuration in nearly every other API so I don't think eliminating one here actually saves much and it makes it likely that folks will write code that mistakenly assumes the default configuration. As for create options we might try something like: outputStream = File.create(config, path, new FsCreate().permission(p).blocksize(b)); We might also provide a version like: outputStream = File.create(config, path); that is equivalent to: outputStream = File.create(config, path, new FsCreate());
          Hide
          Sanjay Radia added a comment -

          Doug> Application code often becomes library code, and changing code that does not explicitly pass a configuration to start passing a configuration breaks compatibility.

          I guess I am missing this library argument and having a hard time buying the argument "the rest of the Hodoop APIs do this and so we need to do this here". Most systems do not have such a config parameter in their APIs and they support both Application code and library code. What is so different about Hadoop and its libraries? Some example of the libraries you have in mind would help.
          In most file systems, the config state is provided by the underlying OS; it include the root and the working dir. Libraries need to set these if the default is not good enough.

          Most file system interfaces are dead simple. I think Hadoop's API can be made equally simple and at the same time allow complex libraries to be built relatively easily. I thought the exportConfig/importConfig facilitates that.

          Your use-case is where a piece of application code suddenly turns into library code.
          The application code would have used the default config; when it turns into a library, the line of code in your proposal that says "config = new config()" would have to change to "config = CallersConfigArg".
          In my proposal the libray writer will have to add a line "Files.importConfig(configArg)".
          Either way one line of code has to change: setting the default config or passing in a different config argument.

          I think there is more to the underlying environment than just the config; hence libraries have to be more careful . Owen, Arun and I discussed the Task Tracker and whether or not it can do any work on behalf of the job (either a some pre-work (loading cache) or the task itself. The task tracker has to take the credentials and the class path from the job submitter's environment. For example when the task tracker acts on behalf of the job submitter, it has to do the equivalent of JAAS's callerSubject.doAs().

          Show
          Sanjay Radia added a comment - Doug> Application code often becomes library code, and changing code that does not explicitly pass a configuration to start passing a configuration breaks compatibility. I guess I am missing this library argument and having a hard time buying the argument "the rest of the Hodoop APIs do this and so we need to do this here". Most systems do not have such a config parameter in their APIs and they support both Application code and library code. What is so different about Hadoop and its libraries? Some example of the libraries you have in mind would help. In most file systems, the config state is provided by the underlying OS; it include the root and the working dir. Libraries need to set these if the default is not good enough. Most file system interfaces are dead simple. I think Hadoop's API can be made equally simple and at the same time allow complex libraries to be built relatively easily. I thought the exportConfig/importConfig facilitates that. Your use-case is where a piece of application code suddenly turns into library code. The application code would have used the default config; when it turns into a library, the line of code in your proposal that says "config = new config()" would have to change to "config = CallersConfigArg". In my proposal the libray writer will have to add a line "Files.importConfig(configArg)". Either way one line of code has to change: setting the default config or passing in a different config argument. I think there is more to the underlying environment than just the config; hence libraries have to be more careful . Owen, Arun and I discussed the Task Tracker and whether or not it can do any work on behalf of the job (either a some pre-work (loading cache) or the task itself. The task tracker has to take the credentials and the class path from the job submitter's environment. For example when the task tracker acts on behalf of the job submitter, it has to do the equivalent of JAAS's callerSubject.doAs().
          Hide
          Doug Cutting added a comment -

          > I guess I am missing this library argument [ ... ]

          What I mean is that there isn't a clear line between application and library code, that folks might wish to be able to, e.g., start running jobs on two clusters at once, without having to refactor everything to keep track of the config. We should not encourage a style that makes this kind of thing more difficult, but rather one that makes it easy.

          > Most file system interfaces are dead simple. I think Hadoop's API can be made equally simple [ ... ]

          I agree. But I also think we need to pass the configuration explicitly, and not depend on a static configuration. Our current convention is that only command-line tools create all-new configurations and I think there are good reasons to stick with that.

          > In most file systems, the config state is provided by the underlying OS;

          The configuration is the equivalent of the unix environment. Using per-process environment variables makes sense when you start new processes for new operations, the unix convention, but that's not the way folks generally work in Java. We could use a static configuration, in fact we used to long ago, but that caused lots of problems. In other projects I've frequently seen folks start out using Java system properties, then switch to something that's passed dynamically. That sort of switch is painful and something we should avoid and encourage others to avoid.

          > In my proposal the libray writer will have to add a line "Files.importConfig(configArg)".

          That does not work for multi-threaded applications, where different threads are configured differently. Consider, e.g., a Nutch-based system that has different search zones (like, e.g. http://www.archive-it.org/). This is a real and common use case. Some developers use Hadoop's APIs directly, but many if not more layer things on top of it (Hive, HBase, Pig, Jaql, Nutch). Its reasonable for any of these, or just about any Hadoop application, to be repackaged as a multi-user service. Folks should not have to refactor to do that.

          Show
          Doug Cutting added a comment - > I guess I am missing this library argument [ ... ] What I mean is that there isn't a clear line between application and library code, that folks might wish to be able to, e.g., start running jobs on two clusters at once, without having to refactor everything to keep track of the config. We should not encourage a style that makes this kind of thing more difficult, but rather one that makes it easy. > Most file system interfaces are dead simple. I think Hadoop's API can be made equally simple [ ... ] I agree. But I also think we need to pass the configuration explicitly, and not depend on a static configuration. Our current convention is that only command-line tools create all-new configurations and I think there are good reasons to stick with that. > In most file systems, the config state is provided by the underlying OS; The configuration is the equivalent of the unix environment. Using per-process environment variables makes sense when you start new processes for new operations, the unix convention, but that's not the way folks generally work in Java. We could use a static configuration, in fact we used to long ago, but that caused lots of problems. In other projects I've frequently seen folks start out using Java system properties, then switch to something that's passed dynamically. That sort of switch is painful and something we should avoid and encourage others to avoid. > In my proposal the libray writer will have to add a line "Files.importConfig(configArg)". That does not work for multi-threaded applications, where different threads are configured differently. Consider, e.g., a Nutch-based system that has different search zones (like, e.g. http://www.archive-it.org/ ). This is a real and common use case. Some developers use Hadoop's APIs directly, but many if not more layer things on top of it (Hive, HBase, Pig, Jaql, Nutch). Its reasonable for any of these, or just about any Hadoop application, to be repackaged as a multi-user service. Folks should not have to refactor to do that.
          Hide
          Sanjay Radia added a comment -

          Thanks for the clarifications; I understand the issue better.

          >> In my proposal the libray writer will have to add a line "Files.importConfig(configArg)".
          >That does not work for multi-threaded applications, where different threads are configured differently.

          Depends on how you implement it. Thread local variables (as we do for the UGI credentials) would address MT..

          We have at least the following 3 options

          1) Class with static methods (as per patch) using thread local variables to store the
          default config.

          2) Class with static methods and a config parameter for each method

          3)Class with instance methods

          Class Files {
             static Files defaultConf();
             Files(config)
             //open, create etc are all instance methods
          }
          myFiles = Files.defaultConf(); // common usage
          myFiles = new Files(myConf); // for those that want a specific conf
          
          myFiles.create(...)
          
          
          Show
          Sanjay Radia added a comment - Thanks for the clarifications; I understand the issue better. >> In my proposal the libray writer will have to add a line "Files.importConfig(configArg)". >That does not work for multi-threaded applications, where different threads are configured differently. Depends on how you implement it. Thread local variables (as we do for the UGI credentials) would address MT.. We have at least the following 3 options 1) Class with static methods (as per patch) using thread local variables to store the default config. 2) Class with static methods and a config parameter for each method 3)Class with instance methods Class Files { static Files defaultConf(); Files(config) //open, create etc are all instance methods } myFiles = Files.defaultConf(); // common usage myFiles = new Files(myConf); // for those that want a specific conf myFiles.create(...)
          Hide
          Sanjay Radia added a comment -

          I have taken an initial look at the new NIO2 comping in jdk7 (JSR 203 - http://openjdk.java.net/projects/nio/)

          NIO2 has notions of multiple file system providers (like hadoop) and also has a URI based file system.

          Technical considerations for adopting JSR 203:

          1. Does our file naming fit with that of NIO2?
          2. Is the NIO2 FileSystem provider interface sufficient for expressing hadoop. ( Hadoop-3518 proposes this)
          3. Others?

          There are non-technical issues such as the timing of NIO2 and jdk7; Alan Bateman,
          the spec lead for JSR 203 has informed that jdk7 will come out in 2010 and that google is considering backporting NIO2 to jdk 6 under a different package name.

          In NIO2 the default file system (ie slash relative names like /foo/bar) are relative to the JVM's file system.
          wd-relative file names are also resolved relative to the JVM's file system.
          This makes a lot of sense in Java: Java needs to be consistent with the older Java File system related API; ie a slash-relative and wd-relative are the same whether you use the NIO2 APIs or the other java io APIs.

          However, Hadoop has a slightly different notion of default file system. Hadoop file names are as follows

          1. Fully qualified URIs are resolved against the specified file system: hdfs://nn_adr:port/fo/bar
          2. file:/// are relative the local file system
          3. Slash relative names: - these are resolved relative to the default hadoop file system
            Typically in a Hadoop cluster the default file system is the file system of the cluster. Application by default inherit this config; a user can set his default file system to any file system.
          4. wd relative names are relative to the working directory

          Pathnames of type (3) or (4) are not consistent with NIO2 and the rest of Java io.

          Consider alternate name conventions like hfs:/foo/bar and hfs:foo/bar as Hadoop default root and hadoop working directory: IMHO, these will be too inconvenient: e.g. at the shell, instead of typing "ls foo/bar" one would have to type ls hfs:foo/bar" to list files relative to the working directory.

          I don't see how we can escape this dual notion of root and wd:
          In the code the distinction using jsr 203 APIs will be roughtly:

            Path p  = Paths.get("/foo/bar");  // relative to jvm default fs.
            Path p = HDFS_provider.get("/foo/bar"); // relative to hadoop's defautl fs.
            Path q  = Paths.get("foo/bar");  // relative to jvm's default fs wd.
            Path q = HDFS_provider.get("foo/bar"); // relative to hadoop's wd.
          

          The other thing to evaluate is the file system provider interface. Please continue this discussion on HADOOP-3518. If JSR 203 provider interfaces are not
          sufficient then we need to work with the JSR203 expert group to influence its APIs.

          Also wrt timing we will have to move forward with our APIs for hadoop 1.0
          since 203 will not be finalized by the time Hadoop 1.0 happens.

          Show
          Sanjay Radia added a comment - I have taken an initial look at the new NIO2 comping in jdk7 (JSR 203 - http://openjdk.java.net/projects/nio/ ) NIO2 has notions of multiple file system providers (like hadoop) and also has a URI based file system. Technical considerations for adopting JSR 203: Does our file naming fit with that of NIO2? Is the NIO2 FileSystem provider interface sufficient for expressing hadoop. ( Hadoop-3518 proposes this) Others? There are non-technical issues such as the timing of NIO2 and jdk7; Alan Bateman, the spec lead for JSR 203 has informed that jdk7 will come out in 2010 and that google is considering backporting NIO2 to jdk 6 under a different package name. In NIO2 the default file system (ie slash relative names like /foo/bar) are relative to the JVM's file system. wd-relative file names are also resolved relative to the JVM's file system. This makes a lot of sense in Java: Java needs to be consistent with the older Java File system related API; ie a slash-relative and wd-relative are the same whether you use the NIO2 APIs or the other java io APIs. However, Hadoop has a slightly different notion of default file system. Hadoop file names are as follows Fully qualified URIs are resolved against the specified file system: hdfs://nn_adr:port/fo/bar file:/// are relative the local file system Slash relative names: - these are resolved relative to the default hadoop file system Typically in a Hadoop cluster the default file system is the file system of the cluster. Application by default inherit this config; a user can set his default file system to any file system. wd relative names are relative to the working directory Pathnames of type (3) or (4) are not consistent with NIO2 and the rest of Java io. Consider alternate name conventions like hfs:/foo/bar and hfs:foo/bar as Hadoop default root and hadoop working directory: IMHO, these will be too inconvenient: e.g. at the shell, instead of typing "ls foo/bar" one would have to type ls hfs:foo/bar" to list files relative to the working directory. I don't see how we can escape this dual notion of root and wd: In the code the distinction using jsr 203 APIs will be roughtly: Path p = Paths.get( "/foo/bar" ); // relative to jvm default fs. Path p = HDFS_provider.get( "/foo/bar" ); // relative to hadoop's defautl fs. Path q = Paths.get( "foo/bar" ); // relative to jvm's default fs wd. Path q = HDFS_provider.get( "foo/bar" ); // relative to hadoop's wd. The other thing to evaluate is the file system provider interface. Please continue this discussion on HADOOP-3518 . If JSR 203 provider interfaces are not sufficient then we need to work with the JSR203 expert group to influence its APIs. Also wrt timing we will have to move forward with our APIs for hadoop 1.0 since 203 will not be finalized by the time Hadoop 1.0 happens.
          Hide
          Johan Liesén added a comment -

          I do not fully grok the problem here. Is the meaning of a working directory different in HDFS from that in the local file system?

          I've just briefly browsed through the NIO.2 FS API and, as far as I can tell, the notion of working directory doesn't really exist. It is only applicable to file systems where the environment (working directory, or similar) is implicitly available (or if there is none). In my opinion, this is a good thing.

          In this respect: Paths.get("hdfs://...") doesn't make much sense because the environment for HDFS, the Configuration object, is not available nor is a FileSystem reference.

          Wrt. 3. I'm not sure I understand the meaning of "slash relative names"; isn't that an absolute path?

          Show
          Johan Liesén added a comment - I do not fully grok the problem here. Is the meaning of a working directory different in HDFS from that in the local file system? I've just briefly browsed through the NIO.2 FS API and, as far as I can tell, the notion of working directory doesn't really exist. It is only applicable to file systems where the environment (working directory, or similar) is implicitly available (or if there is none). In my opinion, this is a good thing. In this respect: Paths.get("hdfs://...") doesn't make much sense because the environment for HDFS, the Configuration object, is not available nor is a FileSystem reference. Wrt. 3. I'm not sure I understand the meaning of "slash relative names"; isn't that an absolute path?
          Hide
          Doug Cutting added a comment -

          > Is the meaning of a working directory different in HDFS from that in the local file system?

          Yes. The local filesystem's working directory is global to the JVM and read-only, while Hadoop FileSystem instances each have a working directory that may be changed. For LocalFileSystem, the working directory is initially the JVM's working directory. For HDFS, the working directory is initially the user's home directory in that filesystem (/user/$

          {username}

          ).

          Relative paths are resolved against the working directory. So, "foo" refers to a file in the working directory of the default filesystem while "file:foo" refers to a file in the working directory of the local filesystem. The default filesystem is determined from the Configuration in effect.

          Show
          Doug Cutting added a comment - > Is the meaning of a working directory different in HDFS from that in the local file system? Yes. The local filesystem's working directory is global to the JVM and read-only, while Hadoop FileSystem instances each have a working directory that may be changed. For LocalFileSystem, the working directory is initially the JVM's working directory. For HDFS, the working directory is initially the user's home directory in that filesystem (/user/$ {username} ). Relative paths are resolved against the working directory. So, "foo" refers to a file in the working directory of the default filesystem while "file:foo" refers to a file in the working directory of the local filesystem. The default filesystem is determined from the Configuration in effect.
          Hide
          Johan Liesén added a comment -

          In the NIO.2 API there is no concept of a working directory. But, paths aren't resolved against the working directory, but rather against any other directory, so it's easy to introduce a working directory concept on the application level.

          Show
          Johan Liesén added a comment - In the NIO.2 API there is no concept of a working directory. But, paths aren't resolved against the working directory, but rather against any other directory, so it's easy to introduce a working directory concept on the application level.
          Hide
          Doug Cutting added a comment -

          > In the NIO.2 API there is no concept of a working directory.

          Rather I suspect the working directory concept in NIO.2 is specific to the URI scheme, as in Hadoop, that for file: uri's the java processes working directory would apply, so that file:foo refers to foo in the connected directory in both APIs.

          A material difference (that Sanjay alluded to above) is that the default filesystem (uri scheme & authority) in NIO.2 is fixed to be the local filesystem (file:), while in Hadoop it is configurable. So "foo" in NIO.2 always refers to "file:foo", while in Hadoop it might refer to "hdfs:foo" if HDFS is configured as the default filesystem.

          Show
          Doug Cutting added a comment - > In the NIO.2 API there is no concept of a working directory. Rather I suspect the working directory concept in NIO.2 is specific to the URI scheme, as in Hadoop, that for file: uri's the java processes working directory would apply, so that file:foo refers to foo in the connected directory in both APIs. A material difference (that Sanjay alluded to above) is that the default filesystem (uri scheme & authority) in NIO.2 is fixed to be the local filesystem ( file: ), while in Hadoop it is configurable. So "foo" in NIO.2 always refers to "file:foo", while in Hadoop it might refer to "hdfs:foo" if HDFS is configured as the default filesystem.
          Hide
          Johan Liesén added a comment -

          Yes. I'm not saying that the API defined in JSR 203 maps onto HDFS perfectly.

          So, after proposing this as a GSOC in https://issues.apache.org/jira/browse/HADOOP-3518 and e-mailing some back and forth with Sanjay and Alan Bateman I decided to send in my application even though Hadoop hasn't listed any projects on ASFs website. It's up for review at: http://docs.google.com/Doc?docid=dc5ttndv_36d83k5wqx (and, officially, at http://socghop.appspot.com/student_proposal/show/google/gsoc2009/l/t123854188327 for mentors).

          Hopefully, you guys think it's a good idea, too. I'll be very glad for any suggestions or ideas. (The application deadline has passed, so the application can't be changed. But all other feedback is appreciated.)

          Show
          Johan Liesén added a comment - Yes. I'm not saying that the API defined in JSR 203 maps onto HDFS perfectly. So, after proposing this as a GSOC in https://issues.apache.org/jira/browse/HADOOP-3518 and e-mailing some back and forth with Sanjay and Alan Bateman I decided to send in my application even though Hadoop hasn't listed any projects on ASFs website. It's up for review at: http://docs.google.com/Doc?docid=dc5ttndv_36d83k5wqx (and, officially, at http://socghop.appspot.com/student_proposal/show/google/gsoc2009/l/t123854188327 for mentors). Hopefully, you guys think it's a good idea, too. I'll be very glad for any suggestions or ideas. (The application deadline has passed, so the application can't be changed. But all other feedback is appreciated.)
          Hide
          Sanjay Radia added a comment -

          The attached Files.java is a class for the proposed new interface for the application writer.
          The FileSystem class will remain for the fs implementor (after some cleanup); it will also be used to implement the
          Files class.
          Most of the methods in the Files class do not have an implementation yet - the skeleton is to discuss the proposal.

          Show
          Sanjay Radia added a comment - The attached Files.java is a class for the proposed new interface for the application writer. The FileSystem class will remain for the fs implementor (after some cleanup); it will also be used to implement the Files class. Most of the methods in the Files class do not have an implementation yet - the skeleton is to discuss the proposal.
          Sanjay Radia made changes -
          Attachment Files.java [ 12416174 ]
          Hide
          Sanjay Radia added a comment -

          The attached FilesContext.java is a class for the proposed new interface for the application writer. (I have use the name FIlesContext instead of the original class name Files since the class/object is really a context in which files
          are named and created; further it avoids confusion with the Java class File.
          The FileSystem class will remain for the fs implementing FSs (after some cleanup); We will probably rename it
          to AbstractFileSystem or VirtualFileSystem (since it is very similar to the Unix VFS).
          The FilesContext class is thing layer that mostly calls FileSystem.

          There are still a few methods that remain to be implemented (such as copy, move, glob).

          Special things to note when reviewing the new interface

          • See the examples on how to use FilesContext in the javadoc.
          • Have added notion of using ServerSide defaults using SERVER_DEFAULT
            (essentially its value is -1 and is passed across to the NN to use its defaults
            for ReplicationFactor, blockSize etc.
          • Unlike FileSystem, FilesContext provides only 2 methods for create:
            1. FSDataOutputStream create(Path f, FsPermission permission
              EnumSet<CreateFlag> createFlag)
            2. FSDataOutputStream create(Path f,
              FsPermission permission,
              EnumSet<CreateFlag> createFlag,
              int bufferSize,
              short replication,
              long blockSize,
              Progressable progress)
            • Note that the first one uses the defaults as expected.
              In the 2nd one can still use defaults for individual parameters; e.g.
              create("foo", FSPermissions.default(), EnumSet.of(CREATE), 4096, SERVER_DEFAULT, CONFIG_DEFAULT, null)
              In this case the buf size is 4096, the SS default is used for replication factor and the
              config default for the blockSize.
          • Note the utility functions are in a inner class called Util.
            The reason I have not put them in independent class is that one needs the
            FilesContext for these function (ie to knopw what / and wd are and also for
            replication factor etc for new files created by the utility methods such as copy().
          Show
          Sanjay Radia added a comment - The attached FilesContext.java is a class for the proposed new interface for the application writer. (I have use the name FIlesContext instead of the original class name Files since the class/object is really a context in which files are named and created; further it avoids confusion with the Java class File. The FileSystem class will remain for the fs implementing FSs (after some cleanup); We will probably rename it to AbstractFileSystem or VirtualFileSystem (since it is very similar to the Unix VFS). The FilesContext class is thing layer that mostly calls FileSystem. There are still a few methods that remain to be implemented (such as copy, move, glob). Special things to note when reviewing the new interface See the examples on how to use FilesContext in the javadoc. Have added notion of using ServerSide defaults using SERVER_DEFAULT (essentially its value is -1 and is passed across to the NN to use its defaults for ReplicationFactor, blockSize etc. Unlike FileSystem, FilesContext provides only 2 methods for create: FSDataOutputStream create(Path f, FsPermission permission EnumSet<CreateFlag> createFlag) FSDataOutputStream create(Path f, FsPermission permission, EnumSet<CreateFlag> createFlag, int bufferSize, short replication, long blockSize, Progressable progress) Note that the first one uses the defaults as expected. In the 2nd one can still use defaults for individual parameters; e.g. create("foo", FSPermissions.default(), EnumSet.of(CREATE), 4096, SERVER_DEFAULT, CONFIG_DEFAULT, null) In this case the buf size is 4096, the SS default is used for replication factor and the config default for the blockSize. Note the utility functions are in a inner class called Util. The reason I have not put them in independent class is that one needs the FilesContext for these function (ie to knopw what / and wd are and also for replication factor etc for new files created by the utility methods such as copy().
          Sanjay Radia made changes -
          Attachment FilesContext1.patch [ 12417552 ]
          Sanjay Radia made changes -
          Attachment FilesContext2.patch [ 12417627 ]
          Hide
          Doug Cutting added a comment -

          Some naming nits:

          • I'd prefer to name this just Files or perhaps FileContext rather than FilesContext.
          • FSConfigUtil could simply be called FSConfig. Moreover, its methods could be FileContext methods, no? Why do we need this as a separate class?
          • Why do we need the inner class FSConfig? Its fields seem like they should be FileContext fields directly, no?
          Show
          Doug Cutting added a comment - Some naming nits: I'd prefer to name this just Files or perhaps FileContext rather than FilesContext. FSConfigUtil could simply be called FSConfig. Moreover, its methods could be FileContext methods, no? Why do we need this as a separate class? Why do we need the inner class FSConfig? Its fields seem like they should be FileContext fields directly, no?
          Sanjay Radia made changes -
          Attachment FileContext3.patch [ 12417842 ]
          Hide
          Sanjay Radia added a comment -

          Updated patch.
          (Patch is not complete yet - has missing parts and some bugs I know of).

          Addressed Doug's naming nits + some bug fixes.
          > * I'd prefer to name this just Files or perhaps FileContext rather than FilesContext.
          Called it FileContext.
          The term context is appropriate since FileContext is the context in which file names are resolved.
          Many folks confused Files with FileSystem (thinking I really meant FileSystem but since the
          name was taken I called it Files).
          > * FSConfigUtil could simply be called FSConfig. Moreover, its methods could be FileContext methods, >no? Why do we need this as a separate class?
          Renamed it.
          Kept it as a separate class because it has nothing to do with FilesContext; it merely sets/gets
          keys from the Config. Besides other parts of fs may need to access the keys or their default
          values.

          > * Why do we need the inner class FSConfig? Its fields seem like they should be FileContext fields directly, no?
          Fixed it.

          Show
          Sanjay Radia added a comment - Updated patch. (Patch is not complete yet - has missing parts and some bugs I know of). Addressed Doug's naming nits + some bug fixes. > * I'd prefer to name this just Files or perhaps FileContext rather than FilesContext. Called it FileContext. The term context is appropriate since FileContext is the context in which file names are resolved. Many folks confused Files with FileSystem (thinking I really meant FileSystem but since the name was taken I called it Files). > * FSConfigUtil could simply be called FSConfig. Moreover, its methods could be FileContext methods, >no? Why do we need this as a separate class? Renamed it. Kept it as a separate class because it has nothing to do with FilesContext; it merely sets/gets keys from the Config. Besides other parts of fs may need to access the keys or their default values. > * Why do we need the inner class FSConfig? Its fields seem like they should be FileContext fields directly, no? Fixed it.
          Sanjay Radia made changes -
          Attachment FileContext5.patch [ 12417855 ]
          Hide
          Doug Cutting added a comment -

          > Kept it as a separate class because it has nothing to do with FilesContext; it merely sets/gets
          > keys from the Config.

          It sets/gets the keys that FileContext uses, no? I suppose others may read them, but, if most file access is through FileContext, then FileContext has a primary association with these keys.

          > Besides other parts of fs may need to access the keys or their default
          > values.

          They're static – other parts of the FS can access them.

          I still don't see why these static methods belong on a separate class when they're so closely tied to FileContext.

          Show
          Doug Cutting added a comment - > Kept it as a separate class because it has nothing to do with FilesContext; it merely sets/gets > keys from the Config. It sets/gets the keys that FileContext uses, no? I suppose others may read them, but, if most file access is through FileContext, then FileContext has a primary association with these keys. > Besides other parts of fs may need to access the keys or their default > values. They're static – other parts of the FS can access them. I still don't see why these static methods belong on a separate class when they're so closely tied to FileContext.
          Kan Zhang made changes -
          Link This issue incorporates HDFS-578 [ HDFS-578 ]
          Kan Zhang made changes -
          Link This issue incorporates HDFS-578 [ HDFS-578 ]
          Kan Zhang made changes -
          Link This issue incorporates HDFS-578 [ HDFS-578 ]
          Hide
          Sanjay Radia added a comment -

          Doug, I would like to keep them in a separate class:

          • Lower layers that need these params should not rely on a higher layer like FileContext. This is layer violation.
          • The config variable layer cleanup jira (HDFS-531) needs such a class. That Jira may get in before this one - this further argues for the
            separation.
          • We tend to create huge classes with inner classes that are accessed outside. I dislike that. Smaller classes are easier to
            maintain. Further i rather reply on creating sub-packages if we need lexical protection.
          Show
          Sanjay Radia added a comment - Doug, I would like to keep them in a separate class: Lower layers that need these params should not rely on a higher layer like FileContext. This is layer violation. The config variable layer cleanup jira ( HDFS-531 ) needs such a class. That Jira may get in before this one - this further argues for the separation. We tend to create huge classes with inner classes that are accessed outside. I dislike that. Smaller classes are easier to maintain. Further i rather reply on creating sub-packages if we need lexical protection.
          Hide
          Sanjay Radia added a comment -

          The proposed FileContext API, in addition to being a more convenient interface for application writer, will also simplify the FileSystem layer, freeing that layer from dealing with notions of default filesystem (i.e. /) and wd and also freeing it from having to deal with default values of FS variables like default block size (BS), replication factor (RF), umask, etc

          The original design was to allow options 1 and 2 below for config values. A few folks felt that we needed to retain the notion of client-side defaults (option 3 below).:

          1. Specify the desired values as parameters in the methods such as create. [as before]
          2. Use SS defaults - different FS and their deployments can have their own defaults and also frees the admin from distributing the default config to all client nodes. [New]
          3. Use client side defaults derived from the config. [as before]
            The uploaded patch attempts to give all 3 options.

          After further thinking, I am proposing that we support only options 1 and 2; not 3.
          Reasons:

          • Simpler config
          • Most folks are going to be happy with SS defaults. The admin has set the defaults based on various considerations including cluster size etc. Further in a federated environment, each cluster may have different defaults that a single client-side config cannot accommodate.
          • Turns out that supporting client-side defaults (option 3) complicates the implementation and forces the FileSystem layer to deal with default values for BS, RF etc. The primary reason for that is that the original design of FileSystem allows each distinct FileSystem class to have its own client-side defaults for BS, RF!!
            • If you look at my patch, you will realize that it is incorrect. When resolving a fully qualified URI it needs to look up the defaults for that specific FileSystem. The patch incorrectly uses the defaults for the default filesystem. Turns out the patch for Symbolic links (HDFS-245) also makes the same mistake when it follows symbolic links to foreign file systems.
            • Yes this is fixable, but will require that the FileSystem layer retain notion of defaults for BS, RF. It is much simpler to let the FileContext be the only layer that needs to deal with the context for /, wd and defaults for BS, RF, etc.
          • I believe the notion of client-side defaults is bad idea. SS defaults are sufficient and furthermore they deal with multiple deployments of the same FS while client-side defaults do not deal with that situation.
          Show
          Sanjay Radia added a comment - The proposed FileContext API, in addition to being a more convenient interface for application writer, will also simplify the FileSystem layer, freeing that layer from dealing with notions of default filesystem (i.e. /) and wd and also freeing it from having to deal with default values of FS variables like default block size (BS), replication factor (RF), umask, etc The original design was to allow options 1 and 2 below for config values. A few folks felt that we needed to retain the notion of client-side defaults (option 3 below).: Specify the desired values as parameters in the methods such as create. [as before] Use SS defaults - different FS and their deployments can have their own defaults and also frees the admin from distributing the default config to all client nodes. [New] Use client side defaults derived from the config. [as before] The uploaded patch attempts to give all 3 options. After further thinking, I am proposing that we support only options 1 and 2; not 3. Reasons: Simpler config Most folks are going to be happy with SS defaults. The admin has set the defaults based on various considerations including cluster size etc. Further in a federated environment, each cluster may have different defaults that a single client-side config cannot accommodate. Turns out that supporting client-side defaults (option 3) complicates the implementation and forces the FileSystem layer to deal with default values for BS, RF etc. The primary reason for that is that the original design of FileSystem allows each distinct FileSystem class to have its own client-side defaults for BS, RF!! If you look at my patch, you will realize that it is incorrect. When resolving a fully qualified URI it needs to look up the defaults for that specific FileSystem. The patch incorrectly uses the defaults for the default filesystem. Turns out the patch for Symbolic links ( HDFS-245 ) also makes the same mistake when it follows symbolic links to foreign file systems. Yes this is fixable, but will require that the FileSystem layer retain notion of defaults for BS, RF. It is much simpler to let the FileContext be the only layer that needs to deal with the context for /, wd and defaults for BS, RF, etc. I believe the notion of client-side defaults is bad idea. SS defaults are sufficient and furthermore they deal with multiple deployments of the same FS while client-side defaults do not deal with that situation.
          Sanjay Radia made changes -
          Attachment FileContext6.patch [ 12418116 ]
          Hide
          Sanjay Radia added a comment -

          I forgot to mention that the class FsConfig is suppose to be package private (I had accidental made it public).
          This should clear up the confusion between Doug and me.

          Show
          Sanjay Radia added a comment - I forgot to mention that the class FsConfig is suppose to be package private (I had accidental made it public). This should clear up the confusion between Doug and me.
          Sanjay Radia made changes -
          Attachment FileContext7.patch [ 12418206 ]
          Hide
          Tom White added a comment -

          A couple of initial comments:

          • There is a FileContext in the metrics package too, so perhaps we should rename this.
          • What's the util() method for? Why are its methods not first-class methods of the interface. Seems confusing to users.
          Show
          Tom White added a comment - A couple of initial comments: There is a FileContext in the metrics package too, so perhaps we should rename this. What's the util() method for? Why are its methods not first-class methods of the interface. Seems confusing to users.
          Hide
          Sanjay Radia added a comment -

          > * There is a FileContext in the metrics package too, so perhaps we should rename this.

          Hmm. didn't know that. FileContext is a very appropriate term since it is context for resolving file names. The term context is used heavily in naming literature for that purpose. In our discussion, folks got the whole concept of what I was trying to do as soon as I changed the class name from Files to FileContext. I am tempted to leave it as is since it is in a different package. I don't want to call it a FSContext since it s not a context for FileSystems. Strictly specking it is context for file names. Perhaps I should call it PathContext or FileNameContext. Or perhaps rename the metrics one since it is an internal class.? Do you have preference in one of the options I suggested? Or another suggestion.

          > * What's the util() method for? Why are its methods not first-class methods of the interface. Seems confusing to users.
          The distinction is the methods in util are merely library methods build over the other basic one.
          Original plan was to move all library methods to FSUtil. But that forced passing an additional parameter (fc) to each of the FSUtil methods. Many disliked that. What I did was a compromise. I don't feel strongly about this either way.

          Show
          Sanjay Radia added a comment - > * There is a FileContext in the metrics package too, so perhaps we should rename this. Hmm. didn't know that. FileContext is a very appropriate term since it is context for resolving file names. The term context is used heavily in naming literature for that purpose. In our discussion, folks got the whole concept of what I was trying to do as soon as I changed the class name from Files to FileContext. I am tempted to leave it as is since it is in a different package. I don't want to call it a FSContext since it s not a context for FileSystems. Strictly specking it is context for file names. Perhaps I should call it PathContext or FileNameContext. Or perhaps rename the metrics one since it is an internal class.? Do you have preference in one of the options I suggested? Or another suggestion. > * What's the util() method for? Why are its methods not first-class methods of the interface. Seems confusing to users. The distinction is the methods in util are merely library methods build over the other basic one. Original plan was to move all library methods to FSUtil. But that forced passing an additional parameter (fc) to each of the FSUtil methods. Many disliked that. What I did was a compromise. I don't feel strongly about this either way.
          Hide
          Sanjay Radia added a comment -

          No one has commented on my proposal on the config issue in this jira. As a result, over the last 2 days, I have had a set of discussions with a number of folks at Yahoo, including Doug and with Dhruba. Here is roughly the set of opinions:

          • Most felt that our config management is a mess and confusing.
          • Everyone likes the notion of Server-side defaults esp when you consider federated clusters and a URI based file namespace as explained in this Jira.
          • Some folks were confused about the URI filesystem and how the FileContext lets us deal with URIs in a first class way. But in the end most felt that it was a good idea. The unix and scp analogy helped get this across.
          • All agreed that most folks will use the SS defaults most of the time. But there are apps that will specify, for example, the blockSize to override the SS default. They liked that the create() call had a parameter to do that.
          • There were a couple folks who felt strongly that one needs to be able to specify the bytesPerChecksum on the client side (see the related HDFS-578); strongly enough to -1 a proposal that did not allow it. Some felt that we should add an additional parameter to the create call while others felt that we should add an options parameter to the create call.
          • There needs to be an undocumented way to override the SS defaults so that one could test new parameters for SS defaults without reconfiguring the clusters. (Dhruba's suggestion)

          Based on the feedback, a proposal is described below. Note for some folks parts of this proposal represents a compromise, but they could live with it. The 21 deadline is very very close and we need to get this in or we will miss the deadline.

          FileContext contains the following items derived from the config:

          • Default fs - /
          • Working dir (derived indirectly via the default file system - details are below)
          • Umask.

          One creates FileContext as described in the patch (the patch is not uptodate with the proposal in this comment).

          • fc = FileContext.getFC()
          • fc = FileContext.getFC(defaultFsUri), etc.

          NO other config parameters are read from the config: The fs client side config contains only two things: your / and your umask; all defaults will come from SS. However, users will be able to override these defaults through the options parameter in the create() call when creating a file. So in this proposal there is not way to set application defaults in the config file.
          (Note We may end up having some undocumented config variables to handle the SS override for testing purpose (Dhruba's request); exact mechanism to be determined - will file a separate jira for discussing this one.).

          So the basic calls are:

          • fc.mkdirs(path, perms)
          • fc.create(path, perms, createOpt ...) // note the use of varArgs
          • fc.open(path, bufSize)

          Examples of create using varargs
          Fc.create(path, perms) // all SS
          Fc.create(path, perms, CreateOpt.blocksize(4096), CreateOpt.repFac(4));

          Roughly: CreateOpt is a class with several subclasses, one per option (Blocksize, RepFactor etc) and a static factory method for each of them such as CreateOpt.blocksize(long).

          Here is the list of options that one will be able to set through the createOptions:

          • progressable - default is null => progress not reported
            • (ie a spec default, not a SS default.
            • Shall we remove progressable?
          • iobufferSize // The rest of the createOptions use SS default if not set
          • replicationFactor
          • blockSize - must be a multiple of bytesPerChecksum and writePacketsize
          • bytesPerChecksum

          The following SS variable is not settable via the createOption.

          • writePacketSize - the SS default is always used.

          If the application desires a particular property it will set it in the createOpt paramaters. There is no automatic support to read these app defaults from a config file; this was deliberate choice.

          The actual mechanisms for createOpts is still to be determined but I am strongly leaning towards varargs rather then a options-Object with setters and getters.

          So please comment on this proposal ASAP. The above proposal was derived after looking at several alternative and lots of discussions; thanks to all those who participated.

          ------
          Some details on how wd and home dirs are derived.
          The wd is derived from the default fs; e.g if the defaultFS is localFS the wd of the process is used to initialize the wd. So HDFS could have SS default for its wd which would be set to the users home directory in that cluster. Similarly the homedir is derived from the defaultFS using server side config. (Note we could have the homedir set on the client side by config vars but I like the way we currently do this for the local filesystem and it would consistent to derive it from the SS; hence the home dir in a cluster becomes a property of the cluster's deployment. This also means less client side config variables.)

          Show
          Sanjay Radia added a comment - No one has commented on my proposal on the config issue in this jira. As a result, over the last 2 days, I have had a set of discussions with a number of folks at Yahoo, including Doug and with Dhruba. Here is roughly the set of opinions: Most felt that our config management is a mess and confusing. Everyone likes the notion of Server-side defaults esp when you consider federated clusters and a URI based file namespace as explained in this Jira. Some folks were confused about the URI filesystem and how the FileContext lets us deal with URIs in a first class way. But in the end most felt that it was a good idea. The unix and scp analogy helped get this across. All agreed that most folks will use the SS defaults most of the time. But there are apps that will specify, for example, the blockSize to override the SS default. They liked that the create() call had a parameter to do that. There were a couple folks who felt strongly that one needs to be able to specify the bytesPerChecksum on the client side (see the related HDFS-578 ); strongly enough to -1 a proposal that did not allow it. Some felt that we should add an additional parameter to the create call while others felt that we should add an options parameter to the create call. There needs to be an undocumented way to override the SS defaults so that one could test new parameters for SS defaults without reconfiguring the clusters. (Dhruba's suggestion) Based on the feedback, a proposal is described below. Note for some folks parts of this proposal represents a compromise, but they could live with it. The 21 deadline is very very close and we need to get this in or we will miss the deadline. FileContext contains the following items derived from the config: Default fs - / Working dir (derived indirectly via the default file system - details are below) Umask. One creates FileContext as described in the patch (the patch is not uptodate with the proposal in this comment). fc = FileContext.getFC() fc = FileContext.getFC(defaultFsUri), etc. NO other config parameters are read from the config : The fs client side config contains only two things: your / and your umask; all defaults will come from SS. However, users will be able to override these defaults through the options parameter in the create() call when creating a file. So in this proposal there is not way to set application defaults in the config file. (Note We may end up having some undocumented config variables to handle the SS override for testing purpose (Dhruba's request); exact mechanism to be determined - will file a separate jira for discussing this one.). So the basic calls are: fc.mkdirs(path, perms) fc.create(path, perms, createOpt ...) // note the use of varArgs fc.open(path, bufSize) Examples of create using varargs Fc.create(path, perms) // all SS Fc.create(path, perms, CreateOpt.blocksize(4096), CreateOpt.repFac(4)); Roughly: CreateOpt is a class with several subclasses, one per option (Blocksize, RepFactor etc) and a static factory method for each of them such as CreateOpt.blocksize(long). Here is the list of options that one will be able to set through the createOptions: progressable - default is null => progress not reported (ie a spec default, not a SS default. Shall we remove progressable? iobufferSize // The rest of the createOptions use SS default if not set replicationFactor blockSize - must be a multiple of bytesPerChecksum and writePacketsize bytesPerChecksum The following SS variable is not settable via the createOption. writePacketSize - the SS default is always used. If the application desires a particular property it will set it in the createOpt paramaters. There is no automatic support to read these app defaults from a config file; this was deliberate choice . The actual mechanisms for createOpts is still to be determined but I am strongly leaning towards varargs rather then a options-Object with setters and getters. So please comment on this proposal ASAP. The above proposal was derived after looking at several alternative and lots of discussions; thanks to all those who participated. ------ Some details on how wd and home dirs are derived. The wd is derived from the default fs; e.g if the defaultFS is localFS the wd of the process is used to initialize the wd. So HDFS could have SS default for its wd which would be set to the users home directory in that cluster. Similarly the homedir is derived from the defaultFS using server side config. (Note we could have the homedir set on the client side by config vars but I like the way we currently do this for the local filesystem and it would consistent to derive it from the SS; hence the home dir in a cluster becomes a property of the cluster's deployment. This also means less client side config variables.)
          Hide
          dhruba borthakur added a comment -

          +1 for this proposal. I vote that we should remove the Progressable option to the create() call.

          Show
          dhruba borthakur added a comment - +1 for this proposal. I vote that we should remove the Progressable option to the create() call.
          Hide
          Doug Cutting added a comment -

          > Shall we remove progressable?

          FWIW, this was added in HADOOP-318. We were then seeing tasks timeout due to delays while writing files to HDFS. Are we now confident that a DFS read or write will never take longer than the task timeout?

          Show
          Doug Cutting added a comment - > Shall we remove progressable? FWIW, this was added in HADOOP-318 . We were then seeing tasks timeout due to delays while writing files to HDFS. Are we now confident that a DFS read or write will never take longer than the task timeout?
          Hide
          dhruba borthakur added a comment -

          My earlier assumption was that the DSFSClient was not even updating Progress. But looking at the code, I find that the DFSOutputStream.DataStrteamer (that streams packets to datanode(s) in the pipeline) is actually updating the "progress" status after packet sends. In that case, we should keep the Progress parameter in the create call.

          Show
          dhruba borthakur added a comment - My earlier assumption was that the DSFSClient was not even updating Progress. But looking at the code, I find that the DFSOutputStream.DataStrteamer (that streams packets to datanode(s) in the pipeline) is actually updating the "progress" status after packet sends. In that case, we should keep the Progress parameter in the create call.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 on CreateOpt + VarArg

          Why perms is not an instance of CreateOpt?

          Show
          Tsz Wo Nicholas Sze added a comment - +1 on CreateOpt + VarArg Why perms is not an instance of CreateOpt?
          Kan Zhang made changes -
          Link This issue incorporates HDFS-593 [ HDFS-593 ]
          Hide
          Tom White added a comment -

          Following the discussion on HDFS-303, we should change the signatures in FileContext to throw exceptions rather than return a boolean for errors. Returning false to indicate an error is dangerous since it is too easily ignored.

          Show
          Tom White added a comment - Following the discussion on HDFS-303 , we should change the signatures in FileContext to throw exceptions rather than return a boolean for errors. Returning false to indicate an error is dangerous since it is too easily ignored.
          Hide
          Sanjay Radia added a comment -

          I also dislike operations that return false when there is a failure; I prefer an exception.
          But in some cases the choice is not not obvious:
          mkdir(path);

          One can argue that if the path does not exist it should returns true.
          If the path exists, mkdir returns false (didn't create the dir but if your goal was to create the then since it exists it is not an exception.
          If you don't have permission to create the dir then throw an exception.

          Same for delete.

          From what I understand, Hadoop has followed the java file apis in a few places. (correct?)
          So far I focused on getting the skeleton and config variables issues addressed in my patches so far.
          This issue was on my list of todos: what do folks want to do? - follow the java file convention or throw exceptions when the operation
          is not performed.


          BTW there are still several things to be cleaned up.
          E.g. I don't like that getBlockLocations() take a FileStatus as input and not a pathname.

          Show
          Sanjay Radia added a comment - I also dislike operations that return false when there is a failure; I prefer an exception. But in some cases the choice is not not obvious: mkdir(path); One can argue that if the path does not exist it should returns true. If the path exists, mkdir returns false (didn't create the dir but if your goal was to create the then since it exists it is not an exception. If you don't have permission to create the dir then throw an exception. Same for delete. From what I understand, Hadoop has followed the java file apis in a few places. (correct?) So far I focused on getting the skeleton and config variables issues addressed in my patches so far. This issue was on my list of todos: what do folks want to do? - follow the java file convention or throw exceptions when the operation is not performed. — BTW there are still several things to be cleaned up. E.g. I don't like that getBlockLocations() take a FileStatus as input and not a pathname.
          Sanjay Radia made changes -
          Attachment FileContext9.patch [ 12418649 ]
          Kan Zhang made changes -
          Link This issue incorporates HDFS-593 [ HDFS-593 ]
          Hide
          Todd Lipcon added a comment -

          I looked through FileContext9.patch in detail last night. My notes are below. Obviously this is still a preliminary patch, but I wasn't sure I'd have time to look again for a few weeks, so figured I'd give a detailed review now - feel free to ignore notes about things that are preliminary bits and won't be in the final patch.

          FileContext.java;
          nits:

          • apache license headers
          • typos:
            FileContext.java:22 operationS
            "imples" -> implies
            "imples a default" (extra space)
          • check for more than 80 columns (eg FileContext.java:33, 39)
          • "file related" -> "file-related" (line 56)
          • "all file systems instance" -> "all file system instances"
          • overall "filesystem" seems better to me than "file system"
          • inconsistent use of "you" vs "one" becomes confusing to read
          • Extraneous bit of javadoc left right before FileContext class definition - looks like notes to self.
          • indentation on initFromConfig method is too deep
          • javadoc for getFSofPath is not filled in
          • extra space before "theConfig" on line 195
          • extra ';' on line 209
          • new member variables defined in the middle of the class (LOCAL_FS_URI and defaultPerm)
          • typo: "relavent" line 267
          • javadoc for setWorkingDirectory refers to "newWdir" but the param is called "p". The description should be under the @param p section.
          • indentation on line 513
          • Javadoc for setTimes: I'd rather see "since the epoch" instead of "since Jan 1, 1970".

          code:

          • "conf" is the common name rather than "theConfig"
          • rather than FileSystem.getInitialWorkingDirectory() returning null by default and having the check in FileContext, have it default to just passing through to getHomeDirectory() in FileSystem.java
          • is defaultFsURI.equals(defaultFS.getUri()) an invariant? If so, why are there two separate member variables? If not, when is that not the case?
          • could makeAbsolute be made public? It seems generally useful
          • style: "isNotSchemeWithRelative" should be called "checkNotSchemeWithRelative" - otherwise it looks like it should return a boolean
          • also, that function claims to throw IOException if it's of the wrong type, but in fact it throws IllegalArgumentException.
          • are these kinds of paths ever legal in Hadoop? If not, can this check go into the Path constructor such that we can never end up with an invalid object?
          • getFSofPath seems like a hackish implementation, but unfortunately that's the API that FileSystem gives. Could that function in FileSystem be refactored a bit so there is a more sensical way of checking ownership of a path without having to throw/catch an exception like this?
          • JavaDoc for FileContext() constructor - again this is the normal behavior for hadoop configuration. I think you can leave it at just "using a default Configuration". In fact, I am sort of in favor of removing this constructor entirely and forcing the user to explicitly choose to construct a new Configuration().
          • the static factory methods (getFileContext, getLocalFSFileContext) seem confusing in that they are not singleton instances. I can see this tripping people up - if you think they should be new instances every time it might be worth renaming the methods to "createFileContext" or "newFileContext".
          • I'm not sure I see the point of the factory methods that call setDefaultFileSystem for you - again the "get" name makes it seem like it would be the same instance every time if you call it multiple times on the same FS.
          • Why allow the user to pass either URI or FileSystem instances? There's less code if you just provide one, and the user can always go from one to the other. I'm in favor of fewer code paths where possible.
          • setWorkingDirectory - why not call makeAbsolute here?
          • CreatOpts should be CreateOptions. I think Ken Thompson once remarked that if he had to do anything differently in his past he would have spelled O_CREAT right
          • ReflactionFac -> ReplicationFactor
          • I'm not sold on the varargs/CreatOpts solution. What about having a class using boxed Integers, where "null" means "not set by the user". Then use the method chaining pattern so you can do new CreateOptions().setDefaultReplication(3).setBlockSize(64*1024*1024).etc.etc.etc. Given this structure we could refactor the general FileSystem interface to take that single parameter, rather than the hard-to-migrate lots-of-arguments-to-open route we've got now.
          • If left that way, at least change the sequence of "ifs" to "else if's, and add an else clause at the end with "throw new IllegalArgumentException(...)"
          • mkdirs: this javadoc claims that it returns false if the directory already exists, but in my experience that isn't true, at least with HDFS. I vaguely recall you opened another JIRA about that elsewhere, so maybe out-of-scope for this one.
          • mkdirs: either don't accept null as an argument, or document that null means defaults in the javadoc
          • style: abs_dir -> absDir
          • delete: another maybe out-of-scope thing: I believe delete with recursive=false will still fail even on an empty directory. Would be nice to clear that up before 0.21.
          • rename: would be nice to clarify these semantics as well. Again I think I saw another JIRA about this.
          • rename: rather than comparing URIs for equality, would it be possible to implement FileSystem.equals to default to using URIs, but allow other classes to override?
          • setOwner: some kind of assertion that (username != null || groupname != null)?
          • setVerifyChecksum(bool) - since this isn't Path-specific, it seems odd that it's present here. The pattern imho should be anything that takes a Path should be present in FileContext and forward through to the appropriate FS, but non-Path-specific stuff shouldn't be delegated.
          • the "Not implemented yet" method should just be removed, imo
          • getFSStatus might be better named getFsStatus to match the type. Also I think better to call the no-param getStatus() rather than getStatus(null), stylewise
          • getFSStatus() doesn't makeAbsolute the path
          • Why does the inner class have a redundant reference thisFC? Since it's not a static inner class this reference already exists implicitly. The purpose of this inner class also isn't clear to me.
          • Agree with the "TBD not very clear" on line 1263 - this code path is really hard to follow.
          • Line 1263: should use srcFS.equals(dstFS) here

          overall:

          • I'm not 100% sold that your working directory can be on a different filesystem than the default filesystem. In particular, I think it's unexpected that "foo" and "/foo" might be on different filesystems. Could this be changed such that there's a concept of a "home directory" and "working directory", and you can always "cd home". Then absolute paths and relative paths would always be on the same filesystem as the working directory.
          • I'm unclear of the definition of "Server Side Defaults" from the JavaDoc - are they in fact server side? They're just configurations, right? In general the discussion of Configuration in FileContext seems somewhat extraneous - the behavior here is the same as it is everywhere else in Hadoop, until HDFS-578 is done. If HDFS-578 is a true pre-requisite to this, can you link the tickets?
          • the isNotSchemeWithRelative function is only called in a few places - it's unclear to me whether this is on purpose or not. Maybe this should just be part of makeAbsolute?
          • There's already some confusion/inclarity in the Hadoop docs about what "qualified" vs "absolute" means in terms of paths. If the JavaDocs here could help clarify that, I think that would be a big plus. (what's the difference between makeQualified and makeAbsolute?)
          • The "TBD"s should be changed to "TODOs" so that they flag on the Hudson test-patch. And, of course, they should be fixed before this gets committed.
          • A lot of the stuff in util is copied straight from FileSystem.java. This code duplication should be avoided. This also includes DEFAULT_FILTER, GlobFilter below

          FsConfig.java:

          nits:

          • typo: line 10: static method*s*
          • missing apache license
          • "see the note on the javdoc" -> jav*a*doc

          code:

          • the "NOTE" section of the class javadoc doesn't make sense to me. What Files class?
          • What is SERVER_DEFAULT(-1) referring to? I see neither the constant nor a literal -1 anywhere. [edit: looks like this is part of HDFS-578, see above]
          • FS_HOME_DIR default isn't quite correct.. it would be /user/<foo> but I don't think you can do a $ {...}

            style substitution from this context. So the calculation of the default would have to be done in code, or else just not provide a default at all and throw an exception if for some reason it can't be grabbed from the Configuration object.

          • Would be nice to have javadocs for each of the static getters
          • getImplClass: check that the URI has a scheme. If it has no scheme, throw an IllegalArgumentException? Add javadoc to that effect.
          • The class javadoc indicates that there can be instances of this class, but it is entirely made of static methods. Remove that section of the javadoc and make the class abstract and final.

          FileSystem.java:

          • line 544 typo agains -> against
          • typo: absolutPermission -> absolutePermission (perhaps this is a new brand of Hadoop sponsored vodka! +1!)

          FileContextBaseTest.java:

          • line 102: should duplicate the assertion from line 103 before setting the working directory here too - make sure we ended up at the right absolute path by means of the relative "cd"s
          • should test setWorkingDirectory on a nonexistent dir
          • also, all of the setWorkingDirectory calls are called with already-qualified paths. Should try some with non-qualified paths too.
          • test on line 117 confirms my earlier comment that mkdirs returns true if you call it on an already-existing directory
          Show
          Todd Lipcon added a comment - I looked through FileContext9.patch in detail last night. My notes are below. Obviously this is still a preliminary patch, but I wasn't sure I'd have time to look again for a few weeks, so figured I'd give a detailed review now - feel free to ignore notes about things that are preliminary bits and won't be in the final patch. FileContext.java; nits: apache license headers typos: FileContext.java:22 operationS "imples" -> implies "imples a default" (extra space) check for more than 80 columns (eg FileContext.java:33, 39) "file related" -> "file-related" (line 56) "all file systems instance" -> "all file system instances" overall "filesystem" seems better to me than "file system" inconsistent use of "you" vs "one" becomes confusing to read Extraneous bit of javadoc left right before FileContext class definition - looks like notes to self. indentation on initFromConfig method is too deep javadoc for getFSofPath is not filled in extra space before "theConfig" on line 195 extra ';' on line 209 new member variables defined in the middle of the class (LOCAL_FS_URI and defaultPerm) typo: "relavent" line 267 javadoc for setWorkingDirectory refers to "newWdir" but the param is called "p". The description should be under the @param p section. indentation on line 513 Javadoc for setTimes: I'd rather see "since the epoch" instead of "since Jan 1, 1970". code: "conf" is the common name rather than "theConfig" rather than FileSystem.getInitialWorkingDirectory() returning null by default and having the check in FileContext, have it default to just passing through to getHomeDirectory() in FileSystem.java is defaultFsURI.equals(defaultFS.getUri()) an invariant? If so, why are there two separate member variables? If not, when is that not the case? could makeAbsolute be made public? It seems generally useful style: "isNotSchemeWithRelative" should be called "checkNotSchemeWithRelative" - otherwise it looks like it should return a boolean also, that function claims to throw IOException if it's of the wrong type, but in fact it throws IllegalArgumentException. are these kinds of paths ever legal in Hadoop? If not, can this check go into the Path constructor such that we can never end up with an invalid object? getFSofPath seems like a hackish implementation, but unfortunately that's the API that FileSystem gives. Could that function in FileSystem be refactored a bit so there is a more sensical way of checking ownership of a path without having to throw/catch an exception like this? JavaDoc for FileContext() constructor - again this is the normal behavior for hadoop configuration. I think you can leave it at just "using a default Configuration". In fact, I am sort of in favor of removing this constructor entirely and forcing the user to explicitly choose to construct a new Configuration(). the static factory methods (getFileContext, getLocalFSFileContext) seem confusing in that they are not singleton instances. I can see this tripping people up - if you think they should be new instances every time it might be worth renaming the methods to "createFileContext" or "newFileContext". I'm not sure I see the point of the factory methods that call setDefaultFileSystem for you - again the "get" name makes it seem like it would be the same instance every time if you call it multiple times on the same FS. Why allow the user to pass either URI or FileSystem instances? There's less code if you just provide one, and the user can always go from one to the other. I'm in favor of fewer code paths where possible. setWorkingDirectory - why not call makeAbsolute here? CreatOpts should be CreateOptions. I think Ken Thompson once remarked that if he had to do anything differently in his past he would have spelled O_CREAT right ReflactionFac -> ReplicationFactor I'm not sold on the varargs/CreatOpts solution. What about having a class using boxed Integers, where "null" means "not set by the user". Then use the method chaining pattern so you can do new CreateOptions().setDefaultReplication(3).setBlockSize(64*1024*1024).etc.etc.etc. Given this structure we could refactor the general FileSystem interface to take that single parameter, rather than the hard-to-migrate lots-of-arguments-to-open route we've got now. If left that way, at least change the sequence of "ifs" to "else if's, and add an else clause at the end with "throw new IllegalArgumentException(...)" mkdirs: this javadoc claims that it returns false if the directory already exists, but in my experience that isn't true, at least with HDFS. I vaguely recall you opened another JIRA about that elsewhere, so maybe out-of-scope for this one. mkdirs: either don't accept null as an argument, or document that null means defaults in the javadoc style: abs_dir -> absDir delete: another maybe out-of-scope thing: I believe delete with recursive=false will still fail even on an empty directory. Would be nice to clear that up before 0.21. rename: would be nice to clarify these semantics as well. Again I think I saw another JIRA about this. rename: rather than comparing URIs for equality, would it be possible to implement FileSystem.equals to default to using URIs, but allow other classes to override? setOwner: some kind of assertion that (username != null || groupname != null)? setVerifyChecksum(bool) - since this isn't Path-specific, it seems odd that it's present here. The pattern imho should be anything that takes a Path should be present in FileContext and forward through to the appropriate FS, but non-Path-specific stuff shouldn't be delegated. the "Not implemented yet" method should just be removed, imo getFSStatus might be better named getFsStatus to match the type. Also I think better to call the no-param getStatus() rather than getStatus(null), stylewise getFSStatus() doesn't makeAbsolute the path Why does the inner class have a redundant reference thisFC? Since it's not a static inner class this reference already exists implicitly. The purpose of this inner class also isn't clear to me. Agree with the "TBD not very clear" on line 1263 - this code path is really hard to follow. Line 1263: should use srcFS.equals(dstFS) here overall: I'm not 100% sold that your working directory can be on a different filesystem than the default filesystem. In particular, I think it's unexpected that "foo" and "/foo" might be on different filesystems. Could this be changed such that there's a concept of a "home directory" and "working directory", and you can always "cd home". Then absolute paths and relative paths would always be on the same filesystem as the working directory. I'm unclear of the definition of "Server Side Defaults" from the JavaDoc - are they in fact server side? They're just configurations, right? In general the discussion of Configuration in FileContext seems somewhat extraneous - the behavior here is the same as it is everywhere else in Hadoop, until HDFS-578 is done. If HDFS-578 is a true pre-requisite to this, can you link the tickets? the isNotSchemeWithRelative function is only called in a few places - it's unclear to me whether this is on purpose or not. Maybe this should just be part of makeAbsolute? There's already some confusion/inclarity in the Hadoop docs about what "qualified" vs "absolute" means in terms of paths. If the JavaDocs here could help clarify that, I think that would be a big plus. (what's the difference between makeQualified and makeAbsolute?) The "TBD"s should be changed to "TODOs" so that they flag on the Hudson test-patch. And, of course, they should be fixed before this gets committed. A lot of the stuff in util is copied straight from FileSystem.java. This code duplication should be avoided. This also includes DEFAULT_FILTER, GlobFilter below FsConfig.java: nits: typo: line 10: static method*s* missing apache license "see the note on the javdoc" -> jav*a*doc code: the "NOTE" section of the class javadoc doesn't make sense to me. What Files class? What is SERVER_DEFAULT(-1) referring to? I see neither the constant nor a literal -1 anywhere. [edit: looks like this is part of HDFS-578, see above] FS_HOME_DIR default isn't quite correct.. it would be /user/<foo> but I don't think you can do a $ {...} style substitution from this context. So the calculation of the default would have to be done in code, or else just not provide a default at all and throw an exception if for some reason it can't be grabbed from the Configuration object. Would be nice to have javadocs for each of the static getters getImplClass: check that the URI has a scheme. If it has no scheme, throw an IllegalArgumentException? Add javadoc to that effect. The class javadoc indicates that there can be instances of this class, but it is entirely made of static methods. Remove that section of the javadoc and make the class abstract and final. FileSystem.java: line 544 typo agains -> against typo: absolutPermission -> absolutePermission (perhaps this is a new brand of Hadoop sponsored vodka! +1!) FileContextBaseTest.java: line 102: should duplicate the assertion from line 103 before setting the working directory here too - make sure we ended up at the right absolute path by means of the relative "cd"s should test setWorkingDirectory on a nonexistent dir also, all of the setWorkingDirectory calls are called with already-qualified paths. Should try some with non-qualified paths too. test on line 117 confirms my earlier comment that mkdirs returns true if you call it on an already-existing directory
          Hide
          Raghu Angadi added a comment -

          * I'm not sold on the varargs/CreatOpts solution. What about having a class using boxed Integers, where "null" means "not set by the user". Then use the method chaining pattern so you can do new CreateOptions().setDefaultReplication(3).setBlockSize(64*1024*1024).etc.etc.etc. Given this structure we could refactor the general FileSystem interface to take that single parameter, rather than the hard-to-migrate lots-of-arguments-to-open route we've got now.

          +1. I always assumed CreateOpts object would be created and then passed as single argument. +1 for chaining pattern to set arguments.

          Show
          Raghu Angadi added a comment - * I'm not sold on the varargs/CreatOpts solution. What about having a class using boxed Integers, where "null" means "not set by the user". Then use the method chaining pattern so you can do new CreateOptions().setDefaultReplication(3).setBlockSize(64*1024*1024).etc.etc.etc. Given this structure we could refactor the general FileSystem interface to take that single parameter, rather than the hard-to-migrate lots-of-arguments-to-open route we've got now. +1. I always assumed CreateOpts object would be created and then passed as single argument. +1 for chaining pattern to set arguments.
          Hide
          Chris Douglas added a comment -

          I always assumed CreateOpts object would be created and then passed as single argument.

          This was also my reaction, but was persuaded that the inevitable component that passed CreateOptions objects around would be more confusing than forcing the params into the caller. Of course, nothing prevents that hypothetical component from building and passing an array of CreateOpts (or an ArrayList<CreateOpts>) through a tangle of functions and ultimately to create, but at least the API is clear. Given that the latter is available for anyone convinced that they need to pass a set of CreateOpts though several layers, the single-object CreateOpts seems to offer few advantages other than keystrokes and the impossibility of duplicate entries in the varargs.

          (Todd: for readability, would you mind trying to compose feedback in a less verbose format? Noting that spelling errors, indentation, and grammar need cleanup in a preliminary patch need not take a dozen bullets; critiques of specific variable names drown out more important points and can be generalized)

          Show
          Chris Douglas added a comment - I always assumed CreateOpts object would be created and then passed as single argument. This was also my reaction, but was persuaded that the inevitable component that passed CreateOptions objects around would be more confusing than forcing the params into the caller. Of course, nothing prevents that hypothetical component from building and passing an array of CreateOpts (or an ArrayList<CreateOpts>) through a tangle of functions and ultimately to create, but at least the API is clear. Given that the latter is available for anyone convinced that they need to pass a set of CreateOpts though several layers, the single-object CreateOpts seems to offer few advantages other than keystrokes and the impossibility of duplicate entries in the varargs. (Todd: for readability, would you mind trying to compose feedback in a less verbose format? Noting that spelling errors, indentation, and grammar need cleanup in a preliminary patch need not take a dozen bullets; critiques of specific variable names drown out more important points and can be generalized)
          Hide
          Todd Lipcon added a comment -

          Given that the latter is available for anyone convinced that they need to pass a set of CreateOpts though several layers, the single-object CreateOpts seems to offer few advantages other than keystrokes and the impossibility of duplicate entries in the varargs.

          My issue is that the multi-argument form pushes the handling of defaults down into the FS implementation, whereas it was my understanding from this patch that the defaults should be handled by FileContext (or another FS-independent class). If there is a single-object CreateOpts, it could either have the defaults preset by its constructor, or have getters that returned defaults when the variables had not been set.

          Todd: for readability, would you mind trying to compose feedback in a less verbose format?

          Yes, I apologize for having such a lengthy/detail-oriented review of a preliminary patch. Given the 0.21 freeze is coming up soon, and my next two weeks are pretty busy, I wanted to get all my comments out there now in case I didn't have time to look again. For future reviews I'll try to more cleanly delineate the detail-oriented things from more important macro issues. A proper code review tool hooked up to JIRA would be invaluable here.

          Show
          Todd Lipcon added a comment - Given that the latter is available for anyone convinced that they need to pass a set of CreateOpts though several layers, the single-object CreateOpts seems to offer few advantages other than keystrokes and the impossibility of duplicate entries in the varargs. My issue is that the multi-argument form pushes the handling of defaults down into the FS implementation, whereas it was my understanding from this patch that the defaults should be handled by FileContext (or another FS-independent class). If there is a single-object CreateOpts, it could either have the defaults preset by its constructor, or have getters that returned defaults when the variables had not been set. Todd: for readability, would you mind trying to compose feedback in a less verbose format? Yes, I apologize for having such a lengthy/detail-oriented review of a preliminary patch. Given the 0.21 freeze is coming up soon, and my next two weeks are pretty busy, I wanted to get all my comments out there now in case I didn't have time to look again. For future reviews I'll try to more cleanly delineate the detail-oriented things from more important macro issues. A proper code review tool hooked up to JIRA would be invaluable here.
          Hide
          Chris Douglas added a comment -

          My issue is that the multi-argument form pushes the handling of defaults down into the FS implementation, whereas it was my understanding from this patch that the defaults should be handled by FileContext (or another FS-independent class).

          Since the defaults are conceptually server-side, I'm assuming that that a FileContext 1) may not know them and 2) won't keep its own set of defaults as state. The intent, I think, is to let these be configurable by passing them through FileContext and into the implementation, which may elect to disregard some options (e.g. replication on local). None of the parameters for create really need getters; it would be an odd client that needs some ratio of the cluster's default block size, rather than a specific value coherent for the caller.

          Show
          Chris Douglas added a comment - My issue is that the multi-argument form pushes the handling of defaults down into the FS implementation, whereas it was my understanding from this patch that the defaults should be handled by FileContext (or another FS-independent class). Since the defaults are conceptually server-side, I'm assuming that that a FileContext 1) may not know them and 2) won't keep its own set of defaults as state. The intent, I think, is to let these be configurable by passing them through FileContext and into the implementation, which may elect to disregard some options (e.g. replication on local). None of the parameters for create really need getters; it would be an odd client that needs some ratio of the cluster's default block size, rather than a specific value coherent for the caller.
          Hide
          Sanjay Radia added a comment -

          Fair number of updates and cleanup since friday (is this why it is called Labor day weekend?)
          Glanced over Todd's comments - turns out i had already fixed many. I will response to/fix his feedback
          in the next couple of days.
          Note 2 matching patches for common and hdfs.

          Show
          Sanjay Radia added a comment - Fair number of updates and cleanup since friday (is this why it is called Labor day weekend?) Glanced over Todd's comments - turns out i had already fixed many. I will response to/fix his feedback in the next couple of days. Note 2 matching patches for common and hdfs.
          Sanjay Radia made changes -
          Attachment FileContext-common10.patch [ 12418854 ]
          Attachment FileContext-hdfs10.patch [ 12418855 ]
          Hide
          Todd Lipcon added a comment -

          I don't feel that strongly about the CreateOpts thing. It might result in a lot of duplicate code across FS implementations (the loop-through-and-populate-variables pattern) but if people think it's clearer than a single object, +1.

          Show
          Todd Lipcon added a comment - I don't feel that strongly about the CreateOpts thing. It might result in a lot of duplicate code across FS implementations (the loop-through-and-populate-variables pattern) but if people think it's clearer than a single object, +1.
          Sanjay Radia made changes -
          Attachment FileContext-common11.patch [ 12418974 ]
          Attachment FileContext-hdfs11.patch [ 12418975 ]
          Sanjay Radia made changes -
          Link This issue blocks HDFS-610 [ HDFS-610 ]
          Hide
          Sanjay Radia added a comment -

          @todd

          >rather than FileSystem.getInitialWorkingDirectory() returning null by default and having the check in FileContext, have it default to just passing through to getHomeDirectory() in FileSystem.java

          I went back and forth on this one. Decided that the filesytem layer needs to be dumb and return information to upper layer (FileContext) and let it set defaults as needed

          >could makeAbsolute be made public? It seems generally useful
          Most apps will want to use the public makeQualified(). Absolute paths cannot be exchanged across FileContexts an hence can lead to a potential confusion on the closure (ie context) for the pathname resolution.

          >are these kinds of paths [relative paths with schemes] ever legal in Hadoop? If not, can this check go into the Path constructor such that we can never end up with an invalid object?
          Such paths do not make sense for FileContext, were redundant for the existing FileSystem and will not be allowed for new AbstractFileSystem (HADOOP-6223). Shells do allow it but probably should not. So till we replace FileSystem with AbstractFileSystem we cannot change the spec for Path. As far as the shell goes we should probably issue a warning when such names are used; btw this may break some of our shecll scripts.

          > in favor of removing this constructor entirely and forcing the user to explicitly choose to construct a new Configuration().
          I disagree - most users should not need to know about config; the default config from the environment should be good enough.
          Further, as we migrate towards little or no client-side config (this jira has moved most config vars to SS) the role of the config becomes less important. Besides MR and tests programs, I don't see many use cases for an app using anything other than the default config.

          > Why allow the user to pass either URI or FileSystem instances? There's less code if you just provide one, and the user can always go from one to the other. I'm in favor of fewer code paths where possible.
          The new AbstractFileSystem that replaces FileSystem will have a protected constructor. So one cannot create a FileSystem using its URI. Only tests will need the static factory method with a FileSystem as a parameter. Most apps will use the factory methods that use default config or one where a URI of the default FileSystem is passed.

          > A lot of the stuff in util is copied straight from FileSystem.java. This code duplication should be avoided.
          FileSystem will be deprecated, removed and replaced by AbstractFileSystem (see HADOOP-6223).
          Hence these utility methods will exist only in the FileContext.

          Show
          Sanjay Radia added a comment - @todd >rather than FileSystem.getInitialWorkingDirectory() returning null by default and having the check in FileContext, have it default to just passing through to getHomeDirectory() in FileSystem.java I went back and forth on this one. Decided that the filesytem layer needs to be dumb and return information to upper layer (FileContext) and let it set defaults as needed >could makeAbsolute be made public? It seems generally useful Most apps will want to use the public makeQualified(). Absolute paths cannot be exchanged across FileContexts an hence can lead to a potential confusion on the closure (ie context) for the pathname resolution. >are these kinds of paths [relative paths with schemes] ever legal in Hadoop? If not, can this check go into the Path constructor such that we can never end up with an invalid object? Such paths do not make sense for FileContext, were redundant for the existing FileSystem and will not be allowed for new AbstractFileSystem ( HADOOP-6223 ). Shells do allow it but probably should not. So till we replace FileSystem with AbstractFileSystem we cannot change the spec for Path. As far as the shell goes we should probably issue a warning when such names are used; btw this may break some of our shecll scripts. > in favor of removing this constructor entirely and forcing the user to explicitly choose to construct a new Configuration(). I disagree - most users should not need to know about config; the default config from the environment should be good enough. Further, as we migrate towards little or no client-side config (this jira has moved most config vars to SS) the role of the config becomes less important. Besides MR and tests programs, I don't see many use cases for an app using anything other than the default config. > Why allow the user to pass either URI or FileSystem instances? There's less code if you just provide one, and the user can always go from one to the other. I'm in favor of fewer code paths where possible. The new AbstractFileSystem that replaces FileSystem will have a protected constructor. So one cannot create a FileSystem using its URI. Only tests will need the static factory method with a FileSystem as a parameter. Most apps will use the factory methods that use default config or one where a URI of the default FileSystem is passed. > A lot of the stuff in util is copied straight from FileSystem.java. This code duplication should be avoided. FileSystem will be deprecated, removed and replaced by AbstractFileSystem (see HADOOP-6223 ). Hence these utility methods will exist only in the FileContext.
          Sanjay Radia made changes -
          Attachment FileContext-common12.patch [ 12419145 ]
          Hide
          Doug Cutting added a comment -
          @InterfaceStability.Evolving /* soon to be stable after  release */
          

          This should be marked as Stable in the patch then, no? Folks should only use releases, not trunk, and we don't want to have to change all of the new APIs from Evolving to Stable when we branch for a release. In other words, its okay to incompatibly change an API that's marked Stable if it's never been released. Does that make sense?

          On the other hand, since this is a new API, it might make sense to leave it as Evolving for a release, so that we have a chance to improve it in case we find problems with it.

          Show
          Doug Cutting added a comment - @InterfaceStability.Evolving /* soon to be stable after release */ This should be marked as Stable in the patch then, no? Folks should only use releases, not trunk, and we don't want to have to change all of the new APIs from Evolving to Stable when we branch for a release. In other words, its okay to incompatibly change an API that's marked Stable if it's never been released. Does that make sense? On the other hand, since this is a new API, it might make sense to leave it as Evolving for a release, so that we have a chance to improve it in case we find problems with it.
          Hide
          Sanjay Radia added a comment -

          @doug.
          My reason was: mark it evolving for a release in case we have changes (i.e your "on the other hand").

          The 1.0 rules do not apply right now; with the current hadoop rules everything is evolving.
          So "legally" we can say stable and can still change it; but I prefer the the way I have proposed.

          Show
          Sanjay Radia added a comment - @doug. My reason was: mark it evolving for a release in case we have changes (i.e your "on the other hand"). The 1.0 rules do not apply right now; with the current hadoop rules everything is evolving. So "legally" we can say stable and can still change it; but I prefer the the way I have proposed.
          Hide
          Sanjay Radia added a comment -

          Minor changes in the patch.
          All/most of my patches have used 2 new protected methods to support
          FileContext. From my understanding, the objections raised in Rename jira would apply to these.
          Hence we are exploring a significantly different implementation strategy.

          I request the community to continue review the FileContext api in this
          patch (but ignore the changes to the lower levels such as in FileSystem).
          Please do the review with urgency since there is a chance (though now slim) of
          committing this jira by the deadline.

          Show
          Sanjay Radia added a comment - Minor changes in the patch. All/most of my patches have used 2 new protected methods to support FileContext. From my understanding, the objections raised in Rename jira would apply to these. Hence we are exploring a significantly different implementation strategy. I request the community to continue review the FileContext api in this patch (but ignore the changes to the lower levels such as in FileSystem). Please do the review with urgency since there is a chance (though now slim) of committing this jira by the deadline.
          Sanjay Radia made changes -
          Attachment FileContext-common13.patch [ 12419429 ]
          Hide
          Sanjay Radia added a comment -

          There have been two comments (tom and sanjay) on methods that return boolean. (mkdir and delete).
          If you care about this issue please add your opinion.

          Show
          Sanjay Radia added a comment - There have been two comments (tom and sanjay) on methods that return boolean. (mkdir and delete). If you care about this issue please add your opinion.
          Hide
          Sanjay Radia added a comment -

          Propose that we change create and mkdirs apis as follows.

          • create - will not create missing directories
          • createRecursive - will create missing parent directories (ie like FileSystem#create.)
          • mkdir - will not create missing directories
          • mkdirRecursive (or mkdirs) - will create missing parent directories (i.e. like FileSystem#mkdirs)

          Create and mkdir are atomic; the recursive versions are not guaranteed to be atomic. (separate discussion on whether or not
          hdfs will continue to support the recursive versions atomically).

          There is a jira (my search attempts failed to find it) that has argued that a non-recursive create is important for MR.

          IMHO the recursive create and recursive mkdirs of FileSystem were a big mistake; we should have stuck to the Unix spec; every diversion from
          the Unix spec should be a very conscious decision.

          Show
          Sanjay Radia added a comment - Propose that we change create and mkdirs apis as follows. create - will not create missing directories createRecursive - will create missing parent directories (ie like FileSystem#create.) mkdir - will not create missing directories mkdirRecursive (or mkdirs) - will create missing parent directories (i.e. like FileSystem#mkdirs) Create and mkdir are atomic; the recursive versions are not guaranteed to be atomic. (separate discussion on whether or not hdfs will continue to support the recursive versions atomically). There is a jira (my search attempts failed to find it) that has argued that a non-recursive create is important for MR. IMHO the recursive create and recursive mkdirs of FileSystem were a big mistake; we should have stuck to the Unix spec; every diversion from the Unix spec should be a very conscious decision.
          Kan Zhang made changes -
          Link This issue is related to HDFS-617 [ HDFS-617 ]
          Kan Zhang made changes -
          Link This issue is related to HDFS-618 [ HDFS-618 ]
          Hide
          Doug Cutting added a comment -

          > My reason was: mark it evolving for a release in case we have changes (i.e your "on the other hand").

          That's fine. I was confused by the comment. Let's change that to say "for a release" then.

          Show
          Doug Cutting added a comment - > My reason was: mark it evolving for a release in case we have changes (i.e your "on the other hand"). That's fine. I was confused by the comment. Let's change that to say "for a release" then.
          Sanjay Radia made changes -
          Attachment FileContext-common14.patch [ 12419588 ]
          Hide
          Jakob Homan added a comment -

          Glancing through the patch, lines 2071 and 2072, one of those calls should be using dst, rather than src while checking scheme

          Show
          Jakob Homan added a comment - Glancing through the patch, lines 2071 and 2072, one of those calls should be using dst, rather than src while checking scheme
          Sanjay Radia made changes -
          Attachment FileContext-common15.patch [ 12419722 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419722/FileContext-common15.patch
          against trunk revision 815524.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 70 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          -1 javac. The applied patch generated 171 javac compiler warnings (more than the trunk's current 145 warnings).

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/37/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/37/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/37/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/37/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419722/FileContext-common15.patch against trunk revision 815524. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 70 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 171 javac compiler warnings (more than the trunk's current 145 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/37/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/37/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/37/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/37/console This message is automatically generated.
          Sanjay Radia made changes -
          Attachment FileContext-common16.patch [ 12419729 ]
          Sanjay Radia made changes -
          Attachment FileContext-common15.patch [ 12419722 ]
          Sanjay Radia made changes -
          Attachment FileContext-common18.patch [ 12419748 ]
          Sanjay Radia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Sanjay Radia made changes -
          Attachment FileContext-common18.patch [ 12419750 ]
          Sanjay Radia made changes -
          Attachment FileContext-common18.patch [ 12419750 ]
          Sanjay Radia made changes -
          Attachment FileContext-common18.patch [ 12419748 ]
          Sanjay Radia made changes -
          Attachment FileContext-common18.patch [ 12419751 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Sanjay Radia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Sanjay Radia made changes -
          Attachment FileContext-common18.patch [ 12419751 ]
          Sanjay Radia made changes -
          Attachment FileContext-common18.patch [ 12419752 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419751/FileContext-common18.patch
          against trunk revision 815524.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          -1 javac. The applied patch generated 171 javac compiler warnings (more than the trunk's current 145 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/38/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/38/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/38/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/38/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419751/FileContext-common18.patch against trunk revision 815524. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 171 javac compiler warnings (more than the trunk's current 145 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/38/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/38/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/38/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/38/console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419752/FileContext-common18.patch
          against trunk revision 815524.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 171 javac compiler warnings (more than the trunk's current 145 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/5/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/5/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/5/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/5/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419752/FileContext-common18.patch against trunk revision 815524. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 171 javac compiler warnings (more than the trunk's current 145 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/5/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/5/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/5/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/5/console This message is automatically generated.
          Sanjay Radia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Sanjay Radia made changes -
          Attachment FileContext-common19.patch [ 12419781 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419781/FileContext-common19.patch
          against trunk revision 815809.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 171 javac compiler warnings (more than the trunk's current 145 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/39/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/39/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/39/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419781/FileContext-common19.patch against trunk revision 815809. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 171 javac compiler warnings (more than the trunk's current 145 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/39/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/39/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/39/console This message is automatically generated.
          Sanjay Radia made changes -
          Attachment FileContext-common20.patch [ 12419801 ]
          Sanjay Radia made changes -
          Attachment FileContext-common20.patch [ 12419801 ]
          Sanjay Radia made changes -
          Attachment FileContext-common21.patch [ 12419803 ]
          gary murry made changes -
          Link This issue blocks HADOOP-6260 [ HADOOP-6260 ]
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Suresh Srinivas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419803/FileContext-common21.patch
          against trunk revision 815809.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/40/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/40/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/40/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/40/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419803/FileContext-common21.patch against trunk revision 815809. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/40/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/40/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/40/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/40/console This message is automatically generated.
          Sanjay Radia made changes -
          Link This issue relates to HADOOP-6223 [ HADOOP-6223 ]
          Hide
          Sanjay Radia added a comment -

          minor comments and spacing changes from previous version.

          Show
          Sanjay Radia added a comment - minor comments and spacing changes from previous version.
          Sanjay Radia made changes -
          Attachment FileContext-common22.patch [ 12419842 ]
          Hide
          Suresh Srinivas added a comment -

          Some minor comments:

          1. revert src/java/org/apache/hadoop/fs/kfs/KosmosFileSystem.java as it just indents the code
          2. Remove TBD Also the exceptions thrown by the methods have to better specified.
          3. FileContext.java - All static member variables should move to the beginning of the class (currently interleaved with methods)
          4. Some dependency on 303 mentioned in the code should be changed to Hadoop-6240

          These are minor comments. This can be changed when additional changes are going to be done for new rename functionality.

          +1 for the patch.

          Show
          Suresh Srinivas added a comment - Some minor comments: revert src/java/org/apache/hadoop/fs/kfs/KosmosFileSystem.java as it just indents the code Remove TBD Also the exceptions thrown by the methods have to better specified. FileContext.java - All static member variables should move to the beginning of the class (currently interleaved with methods) Some dependency on 303 mentioned in the code should be changed to Hadoop-6240 These are minor comments. This can be changed when additional changes are going to be done for new rename functionality. +1 for the patch.
          Hide
          Arun C Murthy added a comment -

          Ok, I finally got around to looking at this jira...

          Given the discussion around the 2 options in HADOOP-6223 I'm assuming this patch goes for a less ambitious change to get in FileContext before the feature-freeze since we can't quite seem to agree on the newer file-system interface for file-system implementers?

          Show
          Arun C Murthy added a comment - Ok, I finally got around to looking at this jira... Given the discussion around the 2 options in HADOOP-6223 I'm assuming this patch goes for a less ambitious change to get in FileContext before the feature-freeze since we can't quite seem to agree on the newer file-system interface for file-system implementers?
          Hide
          Chris Douglas added a comment -
          • The javadoc will not render correctly, for FileContext in particular. This can be cleaned up later, but the docs for a replacement API should be well proofed for the release.
          • FileContext::getDefaultFileSystem is temporary?
          • Can FileContext::isFile and FileContext::isDirectory be removed? These are redundant with getFileStatus

          +1 Overall, though. This is a reasonable path to the new API.

          Show
          Chris Douglas added a comment - The javadoc will not render correctly, for FileContext in particular. This can be cleaned up later, but the docs for a replacement API should be well proofed for the release. FileContext::getDefaultFileSystem is temporary? Can FileContext::isFile and FileContext::isDirectory be removed? These are redundant with getFileStatus +1 Overall, though. This is a reasonable path to the new API.
          Hide
          dhruba borthakur added a comment -

          The latest patch looks good. A few minor comments and questions:

          • The primitiveCreate(), primitiveMkdir() methods seem to have there only for backward compatibility and would probably go away in a future release. Can it be marked as Deprecated?
          • Some file systems like LocalFileSystem have an initial workingDir. Can you pl explain why this is so? what is the value of this initial working dir?
          • In FileContext.java, there is a comment "TBD - explain more here.", "@see xxx for details", "TBD Also the exceptions thrown by", "TBD for HADOOP-6223",
          • would applications need to access FileContext.makeAbsolute()? Can this method be made public?
          • Path.isAbsolute() and Path.isPathComponentAlbolute() look strikingly same to me. do we need both?

          There are some places where the "Deprecated" tag is not in place because it generates HudsonQA warnings. How about if we tag the appropriate methods as Deprecated and then knowingly ignore the "-1 from HudsonQA for javac warnings"?

          Show
          dhruba borthakur added a comment - The latest patch looks good. A few minor comments and questions: The primitiveCreate(), primitiveMkdir() methods seem to have there only for backward compatibility and would probably go away in a future release. Can it be marked as Deprecated? Some file systems like LocalFileSystem have an initial workingDir. Can you pl explain why this is so? what is the value of this initial working dir? In FileContext.java, there is a comment "TBD - explain more here.", "@see xxx for details", "TBD Also the exceptions thrown by", "TBD for HADOOP-6223 ", would applications need to access FileContext.makeAbsolute()? Can this method be made public? Path.isAbsolute() and Path.isPathComponentAlbolute() look strikingly same to me. do we need both? There are some places where the "Deprecated" tag is not in place because it generates HudsonQA warnings. How about if we tag the appropriate methods as Deprecated and then knowingly ignore the "-1 from HudsonQA for javac warnings"?
          Hide
          Doug Cutting added a comment -

          > How about if we tag the appropriate methods as Deprecated and then knowingly ignore the "-1 from HudsonQA for javac warnings"?

          Alternately one can use @suppressWarnings("deprecation") to silence these if the uses are few.

          Show
          Doug Cutting added a comment - > How about if we tag the appropriate methods as Deprecated and then knowingly ignore the "-1 from HudsonQA for javac warnings"? Alternately one can use @suppressWarnings("deprecation") to silence these if the uses are few.
          Hide
          Sanjay Radia added a comment -

          On absolute paths etc:
          >would applications need to access FileContext.makeAbsolute()? Can this method be made public?

          Okay here is how I have been using the terms. It seems to agree with what is in Path().
          FullyQualified Path - has the scheme and authority and the path component is absolute
          AbsolutePath : /foo
          RelativePath: foo (ie relative to wd)

          Applications often want to convert a path to fully qualified if they want to pass it around or store it in a file.
          In a URI based file name space it very dangerous to get an absolute path (ie without the scheme and authority part)
          and pass it around or store in in a file. If someone
          uses a different context then there can be closure confusion. (same is true in Unix - if you store wd relative names in a file or pass them around there can be closure confusion).

          So we don't need to expose FileContext#makeAbsolute () right now- we can add it later if there is a use case.

          (BTW my FileContext#makeAbsolute() was bad method name - its comment and impl states that it merely fixes the relative part.
          I have changed the method name to reflect that.)

          > Path.isAbsolute() and Path.isPathComponentAlbolute() look strikingly same to me. do we need both?
          Turns out the old Path#isAbsolute() was really doing Path#isPathComponentAbsolute().
          But I had to leave the Path#isAbsolute()'s impl unchanged since I did not know if ALL the callers were intending the semantics of isAbsolute() or isPathComponentAbsolute().

          I had intended to file a Jira to explore this and fix it if necessary. If it turns out that all we need is isPathComponentAbsolute()
          then we should deprecate isAbsolute(); besides its impl is incorrect.
          But if there are use cases for isAbsolute() then we should fix its impl and manage the change in spec.
          Sorry my mistake to not have filed the Jira ahead of time for clarity.

          Show
          Sanjay Radia added a comment - On absolute paths etc: >would applications need to access FileContext.makeAbsolute()? Can this method be made public? Okay here is how I have been using the terms. It seems to agree with what is in Path(). FullyQualified Path - has the scheme and authority and the path component is absolute AbsolutePath : /foo RelativePath: foo (ie relative to wd) Applications often want to convert a path to fully qualified if they want to pass it around or store it in a file. In a URI based file name space it very dangerous to get an absolute path (ie without the scheme and authority part) and pass it around or store in in a file. If someone uses a different context then there can be closure confusion. (same is true in Unix - if you store wd relative names in a file or pass them around there can be closure confusion). So we don't need to expose FileContext#makeAbsolute () right now- we can add it later if there is a use case. (BTW my FileContext#makeAbsolute() was bad method name - its comment and impl states that it merely fixes the relative part. I have changed the method name to reflect that.) > Path.isAbsolute() and Path.isPathComponentAlbolute() look strikingly same to me. do we need both? Turns out the old Path#isAbsolute() was really doing Path#isPathComponentAbsolute(). But I had to leave the Path#isAbsolute()'s impl unchanged since I did not know if ALL the callers were intending the semantics of isAbsolute() or isPathComponentAbsolute(). I had intended to file a Jira to explore this and fix it if necessary. If it turns out that all we need is isPathComponentAbsolute() then we should deprecate isAbsolute(); besides its impl is incorrect. But if there are use cases for isAbsolute() then we should fix its impl and manage the change in spec. Sorry my mistake to not have filed the Jira ahead of time for clarity.
          Hide
          Todd Lipcon added a comment -

          FullyQualified Path - has the scheme and authority and the path component is absolute
          AbsolutePath : /foo
          RelativePath: foo (ie relative to wd)

          Can we document these in the javadoc for Path? I've always found it confusing and I think having a clear definition in the code will be very helpful moving forward.

          Show
          Todd Lipcon added a comment - FullyQualified Path - has the scheme and authority and the path component is absolute AbsolutePath : /foo RelativePath: foo (ie relative to wd) Can we document these in the javadoc for Path? I've always found it confusing and I think having a clear definition in the code will be very helpful moving forward.
          Hide
          Sanjay Radia added a comment -

          Douglas > FileContext::getDefaultFileSystem is temporary?
          It is for tests. Changed it to protected and added stability tags.
          > Can FileContext::isFile and FileContext::isDirectory be removed? These are redundant with getFileStatus
          These are convenience methods. Old API supports this and is used extensively.

          Show
          Sanjay Radia added a comment - Douglas > FileContext::getDefaultFileSystem is temporary? It is for tests. Changed it to protected and added stability tags. > Can FileContext::isFile and FileContext::isDirectory be removed? These are redundant with getFileStatus These are convenience methods. Old API supports this and is used extensively.
          Hide
          Sanjay Radia added a comment -

          > The primitiveCreate(), primitiveMkdir() methods seem to have there only for backward compatibility and would probably go away in a future release. Can it be marked as Deprecated?

          These are newly added protected methods to support FileContext on top of FileSystem (nothing to do with backward compatibility).
          Will be removed when the HADOOP-6223 completes.

          Show
          Sanjay Radia added a comment - > The primitiveCreate(), primitiveMkdir() methods seem to have there only for backward compatibility and would probably go away in a future release. Can it be marked as Deprecated? These are newly added protected methods to support FileContext on top of FileSystem (nothing to do with backward compatibility). Will be removed when the HADOOP-6223 completes.
          Hide
          dhruba borthakur added a comment -

          Thanks for the comments, sanjay. Looks good to me. +1

          Show
          dhruba borthakur added a comment - Thanks for the comments, sanjay. Looks good to me. +1
          Hide
          Sanjay Radia added a comment -

          Patch addresses the issues raised.
          Path#makeQualified(FileSystem f) is now deprecated. We will work with hudson to overlook
          the warnings (as Dhruba suggested).
          Will also file a jira to fix the code that calls this deprecated method.

          Show
          Sanjay Radia added a comment - Patch addresses the issues raised. Path#makeQualified(FileSystem f) is now deprecated. We will work with hudson to overlook the warnings (as Dhruba suggested). Will also file a jira to fix the code that calls this deprecated method.
          Sanjay Radia made changes -
          Attachment FileContext-common24.patch [ 12419920 ]
          Hide
          Doug Cutting added a comment -

          > These are newly added protected methods to support FileContext on top of FileSystem (nothing to do with backward compatibility).

          Still, marking them as deprecated seems wise, since we don't want folks to start calling them, do we?

          Show
          Doug Cutting added a comment - > These are newly added protected methods to support FileContext on top of FileSystem (nothing to do with backward compatibility). Still, marking them as deprecated seems wise, since we don't want folks to start calling them, do we?
          Ravi Phulari made changes -
          Link This issue blocks HADOOP-6261 [ HADOOP-6261 ]
          Hide
          Arun C Murthy added a comment -

          +1, we seem to have consensus about the approach to get this in.

          Show
          Arun C Murthy added a comment - +1, we seem to have consensus about the approach to get this in.
          Sanjay Radia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Sanjay Radia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Sanjay Radia added a comment -

          @doug > Still, marking them as deprecated seems wise ...
          Done. New patch uploaded.

          Show
          Sanjay Radia added a comment - @doug > Still, marking them as deprecated seems wise ... Done. New patch uploaded.
          Sanjay Radia made changes -
          Attachment FileContext-common25.patch [ 12419927 ]
          Sanjay Radia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419927/FileContext-common25.patch
          against trunk revision 815809.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 174 javac compiler warnings (more than the trunk's current 145 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/41/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/41/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/41/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419927/FileContext-common25.patch against trunk revision 815809. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 174 javac compiler warnings (more than the trunk's current 145 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/41/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/41/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/41/console This message is automatically generated.
          Sanjay Radia made changes -
          Link This issue relates to HADOOP-6265 [ HADOOP-6265 ]
          Hide
          gary murry added a comment -

          +1 since there is a plan in place to address the javac issues. Normally it would be good to address them in the same patch that generates them, but in the interest of keeping this patch as concise as possible this is a fine solution in the case of deprecating a method.

          Show
          gary murry added a comment - +1 since there is a plan in place to address the javac issues. Normally it would be good to address them in the same patch that generates them, but in the interest of keeping this patch as concise as possible this is a fine solution in the case of deprecating a method.
          Suresh Srinivas made changes -
          Hadoop Flags [Reviewed]
          Release Note Add new improved file system interface FileContext for the application writer.
          Issue Type Improvement [ 4 ] New Feature [ 2 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          Component/s fs [ 12310689 ]
          Hide
          Suresh Srinivas added a comment -

          I just committed the change. Thank you Sanjay.

          Show
          Suresh Srinivas added a comment - I just committed the change. Thank you Sanjay.
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #35 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/35/)
          . Add new improved file system interface FileContext for the application writer. Contributed by Sanjay Radia.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #35 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/35/ ) . Add new improved file system interface FileContext for the application writer. Contributed by Sanjay Radia.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #101 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/101/)
          . Add new improved file system interface FileContext for the application writer. Contributed by Sanjay Radia.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #101 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/101/ ) . Add new improved file system interface FileContext for the application writer. Contributed by Sanjay Radia.
          Sanjay Radia made changes -
          Link This issue relates to HADOOP-6271 [ HADOOP-6271 ]
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #35 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/35/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #35 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/35/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #11 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/11/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #11 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/11/ )
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.
          Robert Chansler made changes -
          Release Note Add new improved file system interface FileContext for the application writer. New FileContext API introduced to replace FileSystem API. FileContext will be the version-compatible API for future releases. FileSystem API will be deprecated in the next release.
          Sanjay Radia made changes -
          Link This issue is depended upon by HADOOP-6356 [ HADOOP-6356 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Sanjay Radia
              Reporter:
              Sanjay Radia
            • Votes:
              0 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development