Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3312

SetPermanentUuidForRemotePeer() isn't resilient to DNS resolution failure

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • consensus, master
    • None

    Description

      When bringing up a new Kudu cluster with multiple masters, these masters must be brought up together and should start within a short time window of 30 secs (FLAGS_raft_get_node_instance_timeout_ms)

      However while bringing up multiple masters on Kubernetes noticed that the bring up fails sometimes since masters aren't brought up together within a short time window. Simply configuring FLAGS_raft_get_node_instance_timeout_ms to a higher timeout didn't help in some cases as the DNS resolution would fail in SetPermanentUuidForRemotePeer() at the very beginning.

       E0827 19:28:53.052981 91 master.cc:279] Unable to init master catalog manager: Network error: Unable to initialize catalog manager: Failed to initialize sys tables async: Failed to create new distributed │ │ Raft config: Unable to resolve UUID for peer member_type: VOTER last_known_addr \{ host: "kudu-master-0.kudu-masters.warehouse-1630092493-z2sz.svc.cluster.local" port: 7051 }: unable to resolve address for ku │ │ du-master-0.kudu-masters.warehouse-1630092493-z2sz.svc.cluster.local: Name or service not known
      

      So the function SetPermanentUuidForRemotePeer() needs to retry for proxy creation/DNS failure in addition to RPC request.
      https://github.com/apache/kudu/blob/master/src/kudu/consensus/consensus_peers.cc#L627
       

      Attachments

        Activity

          People

            Unassigned Unassigned
            bankim Bankim Bhavsar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: