We've seen users who are running into a problem where the RM is storing so many delegation tokens in the ZKRMStateStore that the listing of those znodes is higher than the jute buffer. This is fine during operations, but becomes a problem on a fail over because the RM will try to read in all of the token znodes (i.e. call getChildren on the parent znode). This is particularly bad because everything appears to be okay, but then if a failover occurs you end up with no active RMs.
There was a similar problem with the Yarn application data that was fixed in
YARN-2962 by adding a (configurable) hierarchy of znodes so the RM could pull subchildren without overflowing the jute buffer (though it's off by default).
We should add a hierarchy similar to that of
YARN-2962, but for the delegation token znodes.