Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3278

DNS entry removal of a tablet server causes one of its peers to crash

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.10.0, 1.14.0
    • n/a
    • consensus, tserver
    • None

    Description

      Steps to replicate:

      Let's say a tablet, T1 has three replicas in tablet servers TS1,TS2,TS3.

      If TS1 and TS2 are unable to resolve TS3, one of TS1/TS2 ends up crashing during election/pre-elections irrespective of TS3 state (running/not-running):

      Sample failure logs:

      W0429 04:14:11.043696 801167 leader_election.cc:270] T ecf3e9d1608a4d03ac69a09f0df54b9e P b4eb8f7b19dd4b94a313d8674779b350 [CANDIDATE]: Term 9 election: Was unable to construct an RPC proxy to peer dddc42c5a10b461cb92465815413e996: Network error: unable to resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known. Counting it as a 'NO' vote.
       F0429 04:14:11.046133 801167 raft_consensus.cc:2743] Check failed: _s.ok() Bad status: Network error: Could not obtain a remote proxy to the peer.: unable to resolve address for achennaka-kudu-4.achennaka-kudu.root.hwx.site: Name or service not known

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            achennaka@cloudera.com Abhishek Chennaka
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment