According to the "worst" jstack of a busy namenode I took, there were 29 BLOCKED handler threads. All of them were blocked on the UGI class lock. Here is the breakdown:
- 2 ensureInitialized() - from non static synchronized methods.
HADOOP-9748will unblock these.
- 27 getCurrentUser()
Among the 27 threads that were blocked at getCurrentUser(),
- 18 FSPermissionChecker() - from FSNamesystem#getPermissionChecker() in most namenode RPC methods
- 8 BlockTokenSecretManager#generateToken() - getBlockLocations()
- 1 NameNodeRpcServer.mkdirs
I think FSPermissionChecker can be modified to be created with a passed in UGI. FSNamesystem can the one already stored in RPC server by calling getRemoteUser(). This will eliminate a bulk of getCurrentUser() calls from namenode RPC handlers. A similar change can be made to mkdirs. Block token generation is not as straightforward. Even without it we can eliminate majority of the calls.