I've stared at this thing for quite a while now, and I think I've finally found a way to look at it that makes sense. The way I'm looking at the NN configuration is that We can configure 1 or more name services. The first name service is an implicit name server defined by the servicerpc-address and rpc-address. All additional name services are explicitly named name services configured through dfs.nameservices et al. defaultFS must either point to a service that is already defined (i.e. rpc-address), or it must define the client service for the implicit name service.
For each name service, the getNsServiceRpcUris() method should return exactly one URI. If I have a configuration with:
dfs.nameservices = ns1
dfs.namenode.rpc-address.ns1 = nn1addr:p1
dfs.namenode.servicerpc-address = nn2addr:p2
fs.defaultFS = nn2addr:p3
then I have two distinct name services: ns1 and the implicit name service. The getNsServiceRpcUris() method should return nn1addr:p1 and nn2addr:p2, but not nn2addr:p3, because that's just another interface to the implicit name service.
dfs.nameservices = ns1
dfs.namenode.servicerpc-address.ns1 = nn1addr:p1
fs.defaultFS = nn1addr:p2
Again, we treat these as two separate name services and report nn1addr:p1 and nn1addr:p2. If defaultFS had pointed to nn1addr:p1, then we would only report that address once, because both instances are clearly the same service.
Finally, we get to the one where this approach diverges from prior art:
dfs.nameservices = ns1,ns2
dfs.ha.namenodes.ns1 = nn1,nn2
dfs.namenode.rpc-address.ns1.nn1 = nn1addr:p1
dfs.namenode.rpc-address.ns1.nn2 = nn2addr:p1
dfs.namenode.servicerpc-address.ns2 = nn3addr:p1
dfs.namenode.rpc-address = nn1addr:p2
fs.defaultFS = nn2addr:p2
This example comes from TestDFSUtil. In that test class, the expectation is that we will return four URIs: ns1, nn3addr:p1, nn1addr:p2, nn2addr:p2. Using my definition, the getNsServiceRpcUris() method would return three URIs: ns1, nn3addr:p1, nn1addr:p2. The defaultFS would be ignored because a service URI is already defined for the implicit name service.
Since I'm proposing a change to some basic configuration semantics, I'd like some external validation. The getNsServiceRpcUris() method is only used by the mover and the balancer, so I might be overthinking this a little. Nonetheless, I think it's important to do it right and then document it thoroughly so that no one else has to try to reconstruct the intention again later.