ZK seems good for storing a small thing that does not change. Will the key be generally available if its in zk?
This will rely on securing ZK with kerberos auth (Eugene has a patch for
ZOOKEEPER-938) and setting up ACLs. We already need ZK secured as we use it to broadcast changes in ACLs to the RSs, so this seems to fit with that too. Goes without saying, but using ZK security will be optional config as well from the HBase standpoint.
We'll want to periodically roll master keys and communicate updates to RSs. Hadoop rolls the "current" key every 24 hrs and keeps the last 7, so ZK again seems a good fit to communicate the changes. I considered just storing the key IDs in ZK for change notification and storing key data in HDFS using file permissions for security, but that's just another piece that can break when we're securing ZK anyway.
When would you need this? [token renewal]
Hadoop again does this. I think the jobtracker is designed as the token "renewer" and then it pings the NN to keep it live for up to 7 days. In that case, each token has a "max date", but expiration is computed separately as current time + some window. Expiration is a bit fuzzy in that implementation though, as the renewer can still resurrect expired tokens if the current time < the "max date" in the token. In theory, it limits the window during which token disclosure allows impersonating the user. If the token expires in 24 hours without renewal, and the MR job completes in less than that time, then a disclosure of the token 25 hrs after issue, when the token has expired, and the JT has not needed to renew it, will not allow the token to be re-used to impersonate the user.
However, this doesn't really close the loop if you can somehow trick the JT (the designated "renewer") into resurrecting the expired token for you. Also, we can't use the built in JT renewal as it only works for Hadoop DelegationTokens, so something else would have to handle it for the duration of a job execution. And it's not clear to me that it's a meaningful enhancement in security. So I've ignored the expiration/max date distinction and just made it expire date.
On failure-over, the master would read in the current master keys from ZK and repopulate the valid tokens in memory as they're used, validating that they use an existing master key and haven't expired, err.. "maxed out", yet.
Whats this? We need name for cluster instance? I suppose we can't use master ip plus port because could change with time. The zk ensemble string plus the zk rootdir?
Yeah, this part is a bit tricky. I hadn't thought of cluster ensemble subsets. I was going to ping JD on if replication had anything to use for similar purposes – uniquely identify clusters to prevent replication loops, say. Talking with Andy, he suggested generating a UUID on initial FS setup and adding it to hbase.version. From there master could pop it up in ZK on startup? Maybe I should open a separate JIRA for discussing that bit?
Thanks for the comments! I suppose I should clarify some of these bits on the wiki page.