No one has commented on my proposal on the config issue in this jira. As a result, over the last 2 days, I have had a set of discussions with a number of folks at Yahoo, including Doug and with Dhruba. Here is roughly the set of opinions:
- Most felt that our config management is a mess and confusing.
- Everyone likes the notion of Server-side defaults esp when you consider federated clusters and a URI based file namespace as explained in this Jira.
- Some folks were confused about the URI filesystem and how the FileContext lets us deal with URIs in a first class way. But in the end most felt that it was a good idea. The unix and scp analogy helped get this across.
- All agreed that most folks will use the SS defaults most of the time. But there are apps that will specify, for example, the blockSize to override the SS default. They liked that the create() call had a parameter to do that.
- There were a couple folks who felt strongly that one needs to be able to specify the bytesPerChecksum on the client side (see the related
HDFS-578); strongly enough to -1 a proposal that did not allow it. Some felt that we should add an additional parameter to the create call while others felt that we should add an options parameter to the create call.
- There needs to be an undocumented way to override the SS defaults so that one could test new parameters for SS defaults without reconfiguring the clusters. (Dhruba's suggestion)
Based on the feedback, a proposal is described below. Note for some folks parts of this proposal represents a compromise, but they could live with it. The 21 deadline is very very close and we need to get this in or we will miss the deadline.
FileContext contains the following items derived from the config:
- Default fs - /
- Working dir (derived indirectly via the default file system - details are below)
One creates FileContext as described in the patch (the patch is not uptodate with the proposal in this comment).
- fc = FileContext.getFC()
- fc = FileContext.getFC(defaultFsUri), etc.
NO other config parameters are read from the config: The fs client side config contains only two things: your / and your umask; all defaults will come from SS. However, users will be able to override these defaults through the options parameter in the create() call when creating a file. So in this proposal there is not way to set application defaults in the config file.
(Note We may end up having some undocumented config variables to handle the SS override for testing purpose (Dhruba's request); exact mechanism to be determined - will file a separate jira for discussing this one.).
So the basic calls are:
- fc.mkdirs(path, perms)
- fc.create(path, perms, createOpt ...) // note the use of varArgs
- fc.open(path, bufSize)
Examples of create using varargs
Fc.create(path, perms) // all SS
Fc.create(path, perms, CreateOpt.blocksize(4096), CreateOpt.repFac(4));
Roughly: CreateOpt is a class with several subclasses, one per option (Blocksize, RepFactor etc) and a static factory method for each of them such as CreateOpt.blocksize(long).
Here is the list of options that one will be able to set through the createOptions:
- progressable - default is null => progress not reported
- (ie a spec default, not a SS default.
- Shall we remove progressable?
- iobufferSize // The rest of the createOptions use SS default if not set
- blockSize - must be a multiple of bytesPerChecksum and writePacketsize
The following SS variable is not settable via the createOption.
- writePacketSize - the SS default is always used.
If the application desires a particular property it will set it in the createOpt paramaters. There is no automatic support to read these app defaults from a config file; this was deliberate choice.
The actual mechanisms for createOpts is still to be determined but I am strongly leaning towards varargs rather then a options-Object with setters and getters.
So please comment on this proposal ASAP. The above proposal was derived after looking at several alternative and lots of discussions; thanks to all those who participated.
Some details on how wd and home dirs are derived.
The wd is derived from the default fs; e.g if the defaultFS is localFS the wd of the process is used to initialize the wd. So HDFS could have SS default for its wd which would be set to the users home directory in that cluster. Similarly the homedir is derived from the defaultFS using server side config. (Note we could have the homedir set on the client side by config vars but I like the way we currently do this for the local filesystem and it would consistent to derive it from the SS; hence the home dir in a cluster becomes a property of the cluster's deployment. This also means less client side config variables.)