I haven't read the new design yet. FWIW, just wanted to add a note on dealing with access token errors based on past experience. Access tokens are used as capability tokens for accessing datanodes. They are checked only during pipeline setup. A client can't generate access tokens by itself; tokens have to be obtained from the namenode (or datanode during block recovery). An access token has an expiration date. Since a client may cache access tokens, access token error may occur during pipeline setup due to expired access tokens. Specifically, we have the following situations.
1) Pipeline setup for writing to a new block. In this case, any error (including access token error) will cause the client to abandon the block and get a new blockID with a newly generated access token from the namenode before re-establishing the pipeline. Hence, no need to worry about expired access tokens when re-establishing the pipeline.
2) A pipeline for writing has been successfully setup, however an error occurs during writing. In this case, the client needs to go through a block recovery process and subsequently re-establish the pipeline. At this time, a previously used access token may have become expired. The idea behind
HDFS-195 is that the block recovery process will always return a newly generated access token to the client so that the client can use to re-establish the pipeline. One nice thing about using a newly generated access token is that if a subsequent access token error occurs, we can conclude it is the complaining datanode that is misbehaving and we can exclude it from the set of targets in our next retry.
3) When setting up a pipeline initially for an append operation. Since in the current implementation (before the new design in this JIRA) the client needs to go through the same block recovery process as in case 2) before setting up the pipeline, the client will get a newly generated access token as a result and use it to set up the pipeline. So the same arguments apply.