Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-2192

Lots of errors after applying RATIS-2129

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Ok to be honest I am not sure if it's related to RATIS-2129. But I'm using a build that is Ratis 3.1.1 + RATIS-2129, and I am seeing all kinds of errors running HBase on Ozone.

      failed to take snapshot due to last applied txn not current:

      2024-11-16 00:10:31,035 INFO [grpc-default-executor-22]-org.apache.ratis.server.RaftServer: e693615a-d484-4165-8446-dff08cac5978: remove  FOLLOWER e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1:t229, leader=67eefe63-0930-42d7-a364-e46fde563ff1, voted=67eefe63-0930-42d7-a364-e46fde563ff1, raftlog=Memoized:e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-SegmentedRaftLog:OPENED:c1613342:last(t:229, i:1613343), conf=conf: {index: 1613340, cur=peers:[e693615a-d484-4165-8446-dff08cac5978|10.140.146.67:9856, 67eefe63-0930-42d7-a364-e46fde563ff1|10.140.86.199:9856, 7cc563b3-14b5-4334-820b-5c3bbecffad8|10.140.20.0:9856]|listeners:[], old=null} RUNNING
      2024-11-16 00:10:31,038 INFO [grpc-default-executor-22]-org.apache.ratis.server.RaftServer$Division: e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1: shutdown
      2024-11-16 00:10:31,039 INFO [grpc-default-executor-22]-org.apache.ratis.util.JmxRegister: Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-AF4CEBD817A1,id=e693615a-d484-4165-8446-dff08cac5978
      2024-11-16 00:10:31,039 INFO [grpc-default-executor-22]-org.apache.ratis.server.impl.RoleInfo: e693615a-d484-4165-8446-dff08cac5978: shutdown e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState
      2024-11-16 00:10:31,039 INFO [grpc-default-executor-22]-org.apache.ratis.server.impl.StateMachineUpdater: e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater: set stopIndex = 1613342
      2024-11-16 00:10:31,039 INFO [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState]-org.apache.ratis.server.impl.FollowerState: e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState was interrupted
      2024-11-16 00:10:31,043 ERROR [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine: Failed to take snapshot  for group-AF4CEBD817A1 as the stateMachine is unhealthy. The last applied index is at (t:216, i:1613313)
      2024-11-16 00:10:31,043 ERROR [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater: e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater: Failed to take snapshot
      org.apache.ratis.protocol.exceptions.StateMachineException: Failed to take snapshot  for group-AF4CEBD817A1 as the stateMachine is unhealthy. The last applied index is at (t:216, i:1613313)
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:356)
              at org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:286)
              at org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:278)
              at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194)
              at java.lang.Thread.run(Thread.java:748)
      

      Log entry not found

      2024-11-14 01:59:37,516 WARN [7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon]-org.apache.r
      atis.server.leader.LogAppenderDaemon: 7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon faile
      d
      org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index = 3205
              at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
              at org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
              at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
              at org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)
              at org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
              at java.lang.Thread.run(Thread.java:748)
      

      HDDS-11720 seems to be related too.

      Attachments

        1. ozone-datanode.1.tgz
          983 kB
          Wei-Chiu Chuang
        2. ozone-datanode.2.tgz
          1.73 MB
          Wei-Chiu Chuang
        3. ozone-datanode.3.tgz
          1.25 MB
          Wei-Chiu Chuang

        Issue Links

          Activity

            People

              Unassigned Unassigned
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: