Making the cluster ID should be part of the ConnectionId makes sense (it's part of the identification of the remote cluster). But I think it's addressing only part of the problem. Going through this code, it seems like we have a lot of mismatch between the higher layers and the lower layers, with too much abstraction in between. At the lower layers, most of the ClientCache stuff seems completely unused. We currently effectively have an HBaseClient singleton (for SecureClient as well in 0.92/0.94) in the client code, as I don't see anything that calls the constructor or RpcEngine.getProxy() versions with a non-default socket factory. So a lot of the code around this seems like built up waste.
The fact that a single Configuration is fixed in the HBaseClient seems like a broken abstraction as it currently stands. In addition to cluster ID, other configuration parameters (max retries, retry sleep) are fixed at time of construction. The more I look at the code, the more it looks like the ClientCache and sharing the HBaseClient instance is an unnecessary complication. Why cache the HBaseClient instances at all? In HConnectionManager, we already have a mapping from Configuration to HConnection. It seems to me like each HConnection(Implementation) instance should have it's own HBaseClient instance, doing away with the ClientCache mapping. This would keep each HBaseClient associated with a single cluster/configuration and fix the current breakage from reusing the same HBaseClient against different clusters.
Of course, this would become a larger refactoring, probably reworking some of the HBaseRPC and RpcEngine methods to allow this. Off hand, we might want to expose a separate RpcEngine.getClient() method that returns a new RpcClient interface (implemented by HBaseClient) and move the RpcEngine.getProxy()/stopProxy() implementations into the client. So all proxy invocations can go through the same client, without requiring the static client cache. I haven't fully thought this through, so I could be missing other important aspects. But that approach at least seems like a step in the right direction for fixing the client abstractions.