Description
RaftServerImpl creates a new FollowerState for each vote request received. The old FollowerState may trigger change to candidate role after it was shutdown by the server. Both RaftServerImpl#requestVote and the change to candidate by FollowerState requires lock on server (RaftServerImpl object). The order of events which triggers the problem:
- RaftServerImpl executing requestVote with lock on itself
- FollowerState checks conditions, then waits for lock on server
- RaftServerImpl shuts down old FollowerState, starts new one
- RaftServerImpl sets last RPC time on new FollowerState
- RaftServerImpl releases lock
- old FollowerState enters block with lock on server, triggers change to candidate
Log from Ozone DN where this was seen:
c7514957 votes for 18b95dfb:
2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO server.RaftServer$Division (RaftServerImpl.java:requestVote(1046)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: receive requestVote(ELECTION, 18b95dfb-8ef7-4011-a842-0dc8d0be8521, group-9EECFC4A6075, 2, (t:0, i:0)) 2022-01-07 01:24:07,792 [grpc-default-executor-0] INFO impl.VoteContext (VoteContext.java:log(48)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FOLLOWER: accept ELECTION from 18b95dfb-8ef7-4011-a842-0dc8d0be8521: our priority 0 <= candidate's priority 1 2022-01-07 01:24:07,793 [grpc-default-executor-0] INFO server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: changes role from FOLLOWER to FOLLOWER at term 2 for candidate:18b95dfb-8ef7-4011-a842-0dc8d0be8521 2022-01-07 01:24:07,793 [grpc-default-executor-0] INFO impl.RoleInfo (RoleInfo.java:shutdownFollowerState(108)) - c7514957-7287-4cfe-8c11-55f780daee7f: shutdown c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState 2022-01-07 01:24:07,793 [grpc-default-executor-0] INFO impl.RoleInfo (RoleInfo.java:updateAndGet(130)) - c7514957-7287-4cfe-8c11-55f780daee7f: start c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState 2022-01-07 01:24:07,794 [grpc-default-executor-0] INFO server.RaftServer$Division (RaftServerImpl.java:requestVote(1078)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075 replies to ELECTION vote request: 18b95dfb-8ef7-4011-a842-0dc8d0be8521<-c7514957-7287-4cfe-8c11-55f780daee7f#0:OK-t2. Peer's state: c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075:t2, leader=null, voted=18b95dfb-8ef7-4011-a842-0dc8d0be8521, raftlog=c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-SegmentedRaftLog:OPENED:c-1, conf=-1: [18b95dfb-8ef7-4011-a842-0dc8d0be8521|rpc:10.1.0.7:39307|priority:1, c7514957-7287-4cfe-8c11-55f780daee7f|rpc:10.1.0.7:42065|priority:0, baceac6d-bbfe-42b4-9bde-396690ef56d9|rpc:10.1.0.7:38573|priority:0], old=null
but just after 3ms starts new election due to RPC timeout:
2022-01-07 01:24:07,795 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO impl.FollowerState (FollowerState.java:run(134)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState: change to CANDIDATE, lastRpcElapsedTime:5172276476ns, electionTimeout:5172ms 2022-01-07 01:24:07,796 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO impl.RoleInfo (RoleInfo.java:shutdownFollowerState(108)) - c7514957-7287-4cfe-8c11-55f780daee7f: shutdown c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState 2022-01-07 01:24:07,796 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO server.RaftServer$Division (RaftServerImpl.java:setRole(290)) - c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075: changes role from FOLLOWER to CANDIDATE at term 2 for changeToCandidate 2022-01-07 01:24:07,796 [c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-FollowerState] INFO impl.RoleInfo (RoleInfo.java:updateAndGet(130)) - c7514957-7287-4cfe-8c11-55f780daee7f: start c7514957-7287-4cfe-8c11-55f780daee7f@group-9EECFC4A6075-LeaderElection7
Attachments
Issue Links
- links to