Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Ok to be honest I am not sure if it's related to RATIS-2129. But I'm using a build that is Ratis 3.1.1 + RATIS-2129, and I am seeing all kinds of errors running HBase on Ozone.
failed to take snapshot due to last applied txn not current:
2024-11-16 00:10:31,035 INFO [grpc-default-executor-22]-org.apache.ratis.server.RaftServer: e693615a-d484-4165-8446-dff08cac5978: remove FOLLOWER e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1:t229, leader=67eefe63-0930-42d7-a364-e46fde563ff1, voted=67eefe63-0930-42d7-a364-e46fde563ff1, raftlog=Memoized:e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-SegmentedRaftLog:OPENED:c1613342:last(t:229, i:1613343), conf=conf: {index: 1613340, cur=peers:[e693615a-d484-4165-8446-dff08cac5978|10.140.146.67:9856, 67eefe63-0930-42d7-a364-e46fde563ff1|10.140.86.199:9856, 7cc563b3-14b5-4334-820b-5c3bbecffad8|10.140.20.0:9856]|listeners:[], old=null} RUNNING 2024-11-16 00:10:31,038 INFO [grpc-default-executor-22]-org.apache.ratis.server.RaftServer$Division: e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1: shutdown 2024-11-16 00:10:31,039 INFO [grpc-default-executor-22]-org.apache.ratis.util.JmxRegister: Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-AF4CEBD817A1,id=e693615a-d484-4165-8446-dff08cac5978 2024-11-16 00:10:31,039 INFO [grpc-default-executor-22]-org.apache.ratis.server.impl.RoleInfo: e693615a-d484-4165-8446-dff08cac5978: shutdown e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState 2024-11-16 00:10:31,039 INFO [grpc-default-executor-22]-org.apache.ratis.server.impl.StateMachineUpdater: e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater: set stopIndex = 1613342 2024-11-16 00:10:31,039 INFO [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState]-org.apache.ratis.server.impl.FollowerState: e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState was interrupted 2024-11-16 00:10:31,043 ERROR [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine: Failed to take snapshot for group-AF4CEBD817A1 as the stateMachine is unhealthy. The last applied index is at (t:216, i:1613313) 2024-11-16 00:10:31,043 ERROR [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater: e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater: Failed to take snapshot org.apache.ratis.protocol.exceptions.StateMachineException: Failed to take snapshot for group-AF4CEBD817A1 as the stateMachine is unhealthy. The last applied index is at (t:216, i:1613313) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:356) at org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:286) at org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:278) at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194) at java.lang.Thread.run(Thread.java:748)
Log entry not found
2024-11-14 01:59:37,516 WARN [7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon]-org.apache.r atis.server.leader.LogAppenderDaemon: 7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon faile d org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index = 3205 at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301) at org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240) at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387) at org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262) at org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80) at java.lang.Thread.run(Thread.java:748)
HDDS-11720 seems to be related too.
Attachments
Attachments
Issue Links
- is caused by
-
RATIS-2129 Low replication performance because of lock contention on RaftLog
- Resolved