Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-967 Support priority in leader election
  3. RATIS-1265

Fix leader election with priority too slow

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      As the attached log shows, there are 3 servers: s0, s1, s2, and s2 is the leader, then we change s0 with the highest priority, so s2 will yieldLeaderToHigherPriorityPeer(s0) when s0's log catch up. In yieldLeaderToHigherPriorityPeer, s2 will step down.
      But when s2 step down, which server will request vote is almost random, if s0 can not request vote in a short time, the leader election will last a long time.

      As the attached log shows, election happen 8 times and last 14 seconds, but s0 only try start leader election at the 6th time, and can not get the leadership.

      2020-12-25 10:11:34,995     s1: start s1@group-241716F733F8-LeaderElection2          fail because s0 reject
      2020-12-25 10:11:37,228      s2: start s2@group-241716F733F8-LeaderElection3        fail because s0 reject
      2020-12-25 10:11:39,345     s1: start s1@group-241716F733F8-LeaderElection4         fail because s0 reject
      2020-12-25 10:11:41,600      s1: start s1@group-241716F733F8-LeaderElection5         fail because s0 reject
      2020-12-25 10:11:43,710      s2: start s2@group-241716F733F8-LeaderElection6        fail because s0 reject
      
      2020-12-25 10:11:46,248     s0: start s0@group-241716F733F8-LeaderElection7         fail because s1 start election after 200ms, s1's request vote arrives s2 before s0, so s1 voted for itself and rejected s0 at 2020-12-25 10:11:47,267, and s2 voted for s1 at 2020-12-25 10:11:46,469 and rejected s0 at 2020-12-25 10:11:47,267
      
      2020-12-25 10:11:46,461      s1: start s1@group-241716F733F8-LeaderElection8         fail because s0 reject
      2020-12-25 10:11:48,597      s2: start s2@group-241716F733F8-LeaderElection9        fail because s0 reject
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yjxxtd runzhiwang
            yjxxtd runzhiwang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment