|
[
Permlink
| « Hide
]
dhruba borthakur added a comment - 31/Aug/09 08:05 AM
There are many other parameters that the client can default to the server settings. For example, io.file.buffer.size. What if we allow an option for the client to fetch a subset of the configuration from the Namenode and then use that and then overlay the client-side hdfs-site.xml?
Yes, io.file.buffer.size is passed in to DFSOutputStream ctor, but it doesn't seem to be used in anyway? However, there are 2 other values used by DFSOutputStream that are chosen by the client, i.e., io.bytes.per.checksum and dfs.write.packet.size. My current thinking is if the client chooses to use server default for blockSize, it should use server defaults for io.bytes.per.checksum and dfs.write.packet.size at the same time. What do you think?
Proposal: BlockSize(BS) and RepF (RF)are either obtained from server side (SS) or application specifies it in his call to create.
The bytesPerChecksum and and packetSize are always SS (ie cannot be specified on client side); ie remove this flexibility from the current system. > My current thinking is if the client chooses to use server default for blockSize, it should use server defaults for io.bytes.per.checksum and dfs.write.packet.size at the same time. What do you think?
Sounds good to me. > The bytesPerChecksum and and packetSize are always SS The default value of 512 is suitable for random reads, isn't it? if an application knows that it does not need random read support for a file, it can specify the bytesperChecksum to be larger that the default. Don't we want to allow that flexibility? I was proposing that we remove that flexibility. How many other file systems let you set bytesPerChecksum?
Do you expect a significant performance difference here? > Do you expect a significant performance difference here?
Good point. I am not really aware of any objective benchmark in this regard. No sure what performance impact this might have. > was proposing that we remove that flexibility What is the simplification if we remove this flexibility, can you pl explain? thanks. This is closely tied to
It also proposes that there are no per FileSystem defaults, only per deployment SS defaults (see What it means is that one does not need fs config variables except for the default fs in the config. I suggest we bundle the 3 params (blockSize, bytesPerChecksum and writePacketSize) together. If a client decides to use server default for blockSize, then all 3 will use server defaults. If a (sophisticated) client decides not to use server default for blockSize, then the client has to provide valid values for all 3 params. In particular, the validation that blockSize matches bytesPerChecksum is done at the client side since currently bytesPerChecksum is not sent to the namenode on the create() call. but I'm open to adding this param to the create() call so that it can be validated on the namenode. Thoughts?
Attached a patch that allows users to use server default values when creating a file (by specifying -1). The 4 params (blockSize, bytesPerChecksum, writePacketSize and io.file.buffer.size) are bundled together. If the user chooses to use server default for blockSize, all 4 params will use server defaults. DFSClient caches server default values for 1 hour and fetches them from server only when necessary.
On a separate note, client can choose io.file.buffer.size and pass it to create() call, but it is actually ignored in the current implementation. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12418460/h578-13.patch against trunk revision 810631. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/1/testReport/ This message is automatically generated. Ignore the above Hadoop QA comment. This patch can be run without
I ran run-test-hdfs and it passed. Here is the test-patch result.
[exec] +1 overall. > I ran run-test-hdfs ...
Hi Kan, you have to run "ant test", which includes all the tests. "run-test-hdfs" does not include the fault injection tests. Per request from Sanjay, this patch will only add method getServerDefaults() to DFSClient and will not otherwise change any existing behavior of DFSClient. Allowing user specifying -1 for server defaults will be done in
+1 Changed to incompatible change since it increments the client protocol version.
Integrated in Hadoop-Hdfs-trunk-Commit #21 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/21/
. Add support for new FileSystem method for clients to get server defaults. Contributed by Kan Zhang. Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #5 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/5/
. Add support for new FileSystem method for clients to get server defaults. Contributed by Kan Zhang. Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #21 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/21/
Editorial pass over all release notes prior to publication of 0.21.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||