Description
When bringing up a new Kudu cluster with multiple masters, these masters must be brought up together and should start within a short time window of 30 secs (FLAGS_raft_get_node_instance_timeout_ms)
However while bringing up multiple masters on Kubernetes noticed that the bring up fails sometimes since masters aren't brought up together within a short time window. Simply configuring FLAGS_raft_get_node_instance_timeout_ms to a higher timeout didn't help in some cases as the DNS resolution would fail in SetPermanentUuidForRemotePeer() at the very beginning.
E0827 19:28:53.052981 91 master.cc:279] Unable to init master catalog manager: Network error: Unable to initialize catalog manager: Failed to initialize sys tables async: Failed to create new distributed │ │ Raft config: Unable to resolve UUID for peer member_type: VOTER last_known_addr \{ host: "kudu-master-0.kudu-masters.warehouse-1630092493-z2sz.svc.cluster.local" port: 7051 }: unable to resolve address for ku │ │ du-master-0.kudu-masters.warehouse-1630092493-z2sz.svc.cluster.local: Name or service not known
So the function SetPermanentUuidForRemotePeer() needs to retry for proxy creation/DNS failure in addition to RPC request.
https://github.com/apache/kudu/blob/master/src/kudu/consensus/consensus_peers.cc#L627