|
[
Permlink
| « Hide
]
Doug Cutting added a comment - 07/Oct/08 05:35 PM
It would be easier to add authentication to RPC, where we need it anyway, than to have to add it to the socket-level API too. So maybe it's time to finally experiment with HDFS data access over RPC?
Moving data access to an authentication-enabled RPC interface doesn't solve the authorization issue we are trying to address here. Even if the DataNodes can authenticate users, they still need to know whether a user has the right to access certain data blocks. Such authorization information can only come from the NameNode.
> Moving data access to an authentication-enabled RPC interface doesn't solve the authorization issue we are trying to address here.
True, but it's a start. It would move us onto a single authentication mechanism, and that will simplify authorization. If we're using Kerberos for authentication, we might use tickets whose authorization data contains a list of block ids. > If we're using Kerberos for authentication, we might use tickets whose authorization data contains a list of block ids.
Yes, that is an option. However, I think it's better for Hadoop to do it in an authentication independent way. 1) Not being tied to Kerberos tickets allows us to accommodate other authentication mechanisms if needed. 2) It gives us flexibility and we are not constrained by the peculiarities of Kerberos implementation, such as authorization field size, expiration and renewal requirements, etc. More importantly, the issuing of authorization tickets can be done at a more convenient time after authentication and without the dependency on Kerberos KDC. > I think it's better for Hadoop to do it in an authentication independent way.
Okay. So clients would still probably get some sort of signed ticket from the Namenode that encodes the blocks which may be accessed, along with a timeout, etc. They'd pass this to datanodes with block requests, and the datanode would validate its signature before using it. Is something like that what you have in mind? Are you trying to do authorization without authentication? This seems to me theoretically impossible. Could you explain your design more?
> Are you trying to do authorization without authentication?
No. There are 3 parties here. Authentication is done at NN. When the client comes to DN, all DN needs to know is that the requested operation has been authorized by NN. > This seems to me theoretically impossible. I plan to introduce an HDFS token, called Access Token, as a vehicle to pass data access authorization information from NN to DN. One can think of Access Tokens as capabilities; an Access Token enables its owner to access certain data blocks. It is issued by NN and used on DN. Access Tokens should be generated in such a way that their authenticity can be verified by DN.
In general, tokens can be generated in 2 ways. A) Using a public-key scheme, where NN chooses a pair of private/public keys and uses the private key to sign a token. The signature becomes an integral part of the token. DN is given NN's public key, which can be used to verify the signature associated with a token. Since only the NN knows the private key, only the NN can generate a valid token. B) Using a symmetric key scheme, where NN and all DNs share a secret key. For each token, the NN computes a keyed hash (also known as message authentication code or MAC) as the token authenticator. The token authenticator becomes an integral part of the token. When a DN receives a token, it uses its copy of the secret key to re-compute the token authenticator and compares it with the one submitted as part of the token. If they match, the token is verified as authentic. Since only NN and DNs know the key (DNs are trusted to never issue tokens; they only use the key to verify tokens they receive), no third party can forge tokens. Method A has the advantage that DN doesn't have to store any secret key and it provides stronger security in the sense that even if a DN is compromised, the attacker still can't forge tokens. However, generating and verifying public-key signatures are expensive compared to symmetric key operations. I plan to use method B to generate Access Tokens. Access Tokens are ideally non-transferable, i.e., only the owner can use it. This means we don't have to worry if a token gets stolen, for example during transit. One way to make it non-transferable is to include the owner's id in the token and require whoever uses the token to authenticate herself as the owner specified in the token. I plan to simply include the owner's id in the token for now and DN doesn't verify it. Authentication and verification of owner id can be added later if needed. Access Tokens are meant to be lightweight and short-lived. No need to renew or revoke an Access Token. When a cached Access Token expires, simply get a new one. Access Tokens should be cached only in memory and never written to disk. A typical use case is as follows. A HDFS client asks NN for block ids/locations for a file. NN verifies that the client is authorized to access the file and sends back block ids/locations along with an Access Token for each block. Whenever the HDFS client needs to access a block, it sends the block id along with its associated Access Token to a DN. DN verifies the Access Token before allowing access to the block. The HDFS client may cache Access Tokens received from NN in memory and only get new tokens from NN when the cached ones expire or accessing non-cached blocks. An Access Token will look like the following, where access mode can be read, write, replicate, etc. An Access Token is valid on all DNs regardless where the data block is actually stored. The secret key used to compute token authenticator is randomly chosen by the NN and sent to DNs when they first register with the NN. There is a key rolling mechanism that updates this key on NN and pushes the new key to DNs at regular intervals. I uploaded a preliminary patch to get some early reviews. It's not complete. In particular, only READ and WRITE operations are changed to use access tokens for now and I have yet to add unit tests. But it should give you a fairly good idea of how access tokens are to be used.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12406413/at31.patch against trunk revision 768376. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/246/testReport/ This message is automatically generated. +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12406807/at36.patch against trunk revision 770044. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/263/testReport/ This message is automatically generated. +1.
This went through couple of iterations of review. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407587/at37.patch against trunk revision 772960. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/314/console This message is automatically generated. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407801/at38.patch against trunk revision 774018. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/328/console This message is automatically generated. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407938/at40.patch against trunk revision 774232. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/333/testReport/ This message is automatically generated. I just committed this. Thanks Kan!
Editorial pass over all release notes prior to publication of 0.21.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||