|
More details on the delegation token design.
OverviewAfter initial authentication to NN using Kerberos credentials, a user may obtain a delegation token, which can be given to user jobs for subsequent authentication to NN as the user. The token is in fact a secret key shared between the user and NN and should be protected when passed over insecure channels. Anyone who gets it can impersonate the user on NN. Note that a user can only obtain new tokens after authenticating using Kerberos. When a user obtains a delegation token from NN, the user should tell NN who is the designated token renewer. The designated renewer should authenticate to NN as itself when renewing the token for the user. Renewing a token means extending the validity period of that token on NN. No new token is issued. The old token continues to work. To let a Map/Reduce job use a delegation token, the user needs to designate JT as the token renewer. All the Tasks of the same job use the same token. JT is responsible for keeping the token valid till the job is finished. After that, JT may optionally cancel the token. DesignHere is the format of delegation token. TokenID = {ownerID, renewerID, issueDate, maxDate}
TokenAuthenticator = HMAC(masterKey, TokenID)
Delegation Token = {TokenID, TokenAuthenticator}
NN chooses masterKey randomly and uses it to generate and verify delegation tokens. NN keeps all active tokens in memory and associates each token with an expiryDate. If currentTime > expiryDate, the token is considered expired and any client authentication request using the token will be rejected. Expired tokens will be deleted from memory. A token is also deleted from memory when the owner or the renewer cancels the token. Using Delegation Token When a client (e.g., a Task) uses a delegation token to authenticate, it first sends TokenID to NN (but never sends the associated TokenAuthenticator to NN). TokenID identifies the token the client intends to use. Using TokenID and masterKey, NN can re-compute TokenAuthenticator and the token. NN checks if the token is valid. A token is valid if and only if the token exists in memory and currentTime < expiryDate associated with the token. If the token is valid, the client and NN will try to authenticate each other using their own TokenAuthenticator as the secret key and DIGEST-MD5 Token Renewal Delegation tokens need to be renewed periodically to keep them valid. Suppose JT is the designated renewer for a token. During renewal, JT authenticates to NN as JT. After successful authentication, JT sends the token to be renewed to NN. NN verifies that 1) JT is the renewer specified in TokenID, 2) TokenAuthenticator is correct, and 3) currentTime < maxDate specified in TokenID. Upon successful verification, if the token exists in memory, which means the token is currently valid, NN sets its new expiryDate to min(currentTime+renewPeriod, maxDate). If the token doesn't exist in memory, which indicates NN has restarted and therefore lost memory of all previously stored tokens, NN adds the token to memory and sets its expiryDate similarly. The latter case allows jobs to survive NN restarts. All JT has to do is to renew all tokens with NN after NN restarts and before relaunching failed Tasks. Note that the designated renewer can revive an expired (or canceled) token by simply renewing it, if currentTime < maxDate specified in the token. This is because NN can't tell the difference between a token that has expired (or has been canceled) and a token that is not in the memory because NN restarted. Since only the designated renewer can revive an expired (or canceled) token, this doesn't seem to be a security problem. An attacker who steals the token can't renew or revive it. The masterKey needs to be updated periodically. NN only needs to persist the masterKey on disk, not the tokens. An additional benefit of using Hadoop proprietary delegation tokens for delegation, as opposed to using Kerberos TGT/Service tickets, is that Kerberos is only used at the "edge" of Hadoop. Delegation tokens don't depend on Kerberos and can be coupled with non-Kerberos authentication mechanisms (such as SSL) used at the edge.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For all Hadoop services except NN, we simply use Kerberos. For NN, we complement Kerberos with a second mechanism called DIGEST-MD5
(available from Java SASL library). A client can authenticate to NN in 2 ways.