There are two unrelated problems here.
The class cast exception is caused when the two tests run in a particular order. The setup method configures the fair scheduler, but one of the tests ignores the setup conf and uses a default one. Therefore one of the tests was running with the fair scheduler and the other with the capacity scheduler. If the capacity scheduler test runs first, the QueueMetrics will be initialized with a CSQueueMetrics. Later when the fair scheduler tries to wield the already existing queue metric it fails to cast it because it's the wrong type. I fixed this by having both tests use the same base config, and I also cleared out the queue metrics in-between tests just for good measure.
The rolling master key failure is triggered because there's a small benign window of time in the AbstractDelegationTokenSecretManager where a master key can have its expiry updated but not in the state store yet. The test occasionally catches this window, and because DelegationKey leverages the expiration date in its hashcode and equals methods, the contains method on the set of delegation keys fails to find it. As Daryn Sharp pointed out to me offline, arguably DelegationKey should not be using the expiration date for hashcode/equals. However there's tons of stuff using and deriving from DelegationKey, so it's somewhat of a risky change to remove it. Instead I updated the unit test to check for a matching key ID in the state store rather than the contains method.
If we don't update the DelegationKey hashcode/equals then there should be a followup JIRA to fix the MemoryRMStateStore, as it currently leaks delegation keys as they roll. The key's expiration date gets updated, and the state store cannot find them in the set of keys to remove them. A simple fix is to store them by key ID like the other RM state store implementations and the secret manager itself already do.