Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-433

ContainerStateMachine#readStateMachineData should properly build LogEntryProto

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.1
    • Component/s: Ozone Datanode
    • Labels:
      None

      Description

      ContainerStateMachine#readStateMachineData returns LogEntryProto with index set to 0. This leads to exception in Ratis. The LogEntryProto to return should be built over the input LogEntryProto.

      The following exception was seen using Ozone, where the leader send incorrect append entries to follower.

      2018-08-20 07:54:06,200 INFO org.apache.ratis.server.storage.RaftLogWorker: Rolling segment:2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858-RaftLogWorker index to:20312
      2018-08-20 07:54:07,800 INFO org.apache.ratis.server.impl.FollowerState: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858 changes to CANDIDATE, lastRpcTime:1182, electionTimeout:990ms
      2018-08-20 07:54:07,800 INFO org.apache.ratis.server.impl.RaftServerImpl: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858 changes role from org.apache.ratis.server.impl.RoleInfo@6b1e0fb8 to CANDIDATE at term 14
      for changeToCandidate
      2018-08-20 07:54:07,801 INFO org.apache.ratis.server.impl.RaftServerImpl: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858 changes role from org.apache.ratis.server.impl.RoleInfo@6b1e0fb8 to FOLLOWER at term 14 
      for changeToFollower
      2018-08-20 07:54:21,712 INFO org.apache.ratis.server.impl.FollowerState: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858 changes to CANDIDATE, lastRpcTime:2167, electionTimeout:976ms
      2018-08-20 07:54:21,712 INFO org.apache.ratis.server.impl.RaftServerImpl: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858 changes role from org.apache.ratis.server.impl.RoleInfo@6b1e0fb8 to CANDIDATE at term 14
      for changeToCandidate
      2018-08-20 07:54:21,715 INFO org.apache.ratis.server.impl.RaftServerImpl: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858: change Leader from 2bf278ca-2dad-4029-a387-2faeb10adef5_9858 to null at term 14 for ini
      tElection
      2018-08-20 07:54:29,151 INFO org.apache.ratis.server.impl.LeaderElection: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858: begin an election in Term 15
      2018-08-20 07:54:30,735 INFO org.apache.ratis.server.impl.RaftServerImpl: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858 changes role from org.apache.ratis.server.impl.RoleInfo@6b1e0fb8 to FOLLOWER at term 15 
      for changeToFollower
      2018-08-20 07:54:30,740 INFO org.apache.ratis.server.impl.RaftServerImpl: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858: change Leader from null to b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858 at term 15 for app
      endEntries
       
      2018-08-20 07:54:30,741 INFO org.apache.ratis.server.impl.RaftServerImpl: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858-org.apache.ratis.server.impl.RoleInfo@6b1e0fb8: Withhold vote from candidate b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858 with term 15. State: leader=b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858, term=15, lastRpcElapsed=0ms
       
      2018-08-20 07:54:30,745 INFO org.apache.ratis.server.impl.LeaderElection: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858: Election REJECTED; received 1 response(s) [2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858<-2
      bf278ca-2dad-4029-a387-2faeb10adef5_9858#0:FAIL-t15] and 0 exception(s); 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858:t15, leader=b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858, voted=2e240240-0fac-4f93-8aa8-fa8f
      74bf1810_9858, raftlog=[(t:14, i:20374)], conf=[b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858:172.26.32.231:9858, 2bf278ca-2dad-4029-a387-2faeb10adef5_9858:172.26.32.230:9858, 2e240240-0fac-4f93-8aa8-fa8f74bf
      1810_9858:172.26.32.228:9858], old=null
      2018-08-20 07:54:31,227 WARN org.apache.ratis.grpc.server.RaftServerProtocolService: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858: Failed appendEntries b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858->2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858#1
      java.lang.IllegalStateException: Unexpected Index: previous is (t:14, i:20374) but entries[0].getIndex()=0
              at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
              at org.apache.ratis.server.impl.RaftServerImpl.validateEntries(RaftServerImpl.java:786)
              at org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:859)
              at org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:824)
              at org.apache.ratis.server.impl.RaftServerProxy.appendEntriesAsync(RaftServerProxy.java:247)
              at org.apache.ratis.grpc.server.RaftServerProtocolService$1.onNext(RaftServerProtocolService.java:76)
              at org.apache.ratis.grpc.server.RaftServerProtocolService$1.onNext(RaftServerProtocolService.java:66)
              at org.apache.ratis.shaded.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
              at org.apache.ratis.shaded.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:252)
              at org.apache.ratis.shaded.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:629)
              at org.apache.ratis.shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
              at org.apache.ratis.shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      2018-08-20 07:54:33,260 WARN org.apache.ratis.grpc.server.RaftServerProtocolService: 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858: Failed appendEntries b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858->2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858#3
      java.lang.IllegalStateException: Unexpected Index: entries[1].getIndex()=0 but entries[0].getIndex()=0
      
      2018-08-20 11:54:29,174 INFO org.apache.ratis.grpc.server.RaftServerProtocolService: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858: appendEntries completed
      2018-08-20 11:54:30,182 INFO org.apache.ratis.server.impl.FollowerState: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858 changes to CANDIDATE, lastRpcTime:1332, electionTimeout:860ms
      2018-08-20 11:54:30,182 INFO org.apache.ratis.server.impl.RaftServerImpl: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858 changes role from org.apache.ratis.server.impl.RoleInfo@465f9230 to CANDIDATE at term 14
      for changeToCandidate
      2018-08-20 11:54:30,185 INFO org.apache.ratis.server.impl.RaftServerImpl: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858: change Leader from 2bf278ca-2dad-4029-a387-2faeb10adef5_9858 to null at term 14 for initElection
      2018-08-20 11:54:30,199 INFO org.apache.ratis.server.impl.LeaderElection: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858: begin an election in Term 15
      2018-08-20 11:54:30,230 INFO org.apache.ratis.server.impl.LeaderElection: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858: Election PASSED; received 1 response(s) [b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858<-2bf
      278ca-2dad-4029-a387-2faeb10adef5_9858#0:OK-t15] and 0 exception(s); b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858:t15, leader=null, voted=b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858, raftlog=[(t:14, i:20488)],
      conf=[b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858:172.26.32.231:9858, 2bf278ca-2dad-4029-a387-2faeb10adef5_9858:172.26.32.230:9858, 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858:172.26.32.228:9858], old=null
      2018-08-20 11:54:30,230 INFO org.apache.ratis.server.impl.RaftServerImpl: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858 changes role from org.apache.ratis.server.impl.RoleInfo@465f9230 to LEADER at term 15 for changeToLeader
      2018-08-20 11:54:30,230 INFO org.apache.ratis.server.impl.RaftServerImpl: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858: change Leader from null to b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858 at term 15 for becomeLeader
      2018-08-20 11:54:30,246 INFO org.apache.ratis.conf.ConfUtils: raft.server.staging.catchup.gap = 1000 (default)
      2018-08-20 11:54:30,246 INFO org.apache.ratis.conf.ConfUtils: raft.server.rpc.sleep.time = 25 ms (default)
      2018-08-20 11:54:30,269 INFO org.apache.ratis.conf.ConfUtils: raft.server.log.appender.buffer.capacity = 134217728 (custom)
      2018-08-20 11:54:30,270 INFO org.apache.ratis.conf.ConfUtils: raft.server.log.appender.batch.enabled = true (custom)
      2018-08-20 11:54:30,270 INFO org.apache.ratis.conf.ConfUtils: raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
      2018-08-20 11:54:30,273 INFO org.apache.ratis.conf.ConfUtils: raft.grpc.server.leader.outstanding.appends.max = 128 (default)
      2018-08-20 11:54:30,273 INFO org.apache.ratis.conf.ConfUtils: raft.server.rpc.request.timeout = 3000 ms (custom)
      2018-08-20 11:54:30,274 INFO org.apache.ratis.conf.ConfUtils: raft.server.log.appender.buffer.capacity = 134217728 (custom)
      2018-08-20 11:54:30,274 INFO org.apache.ratis.conf.ConfUtils: raft.server.log.appender.batch.enabled = true (custom)
      2018-08-20 11:54:30,274 INFO org.apache.ratis.conf.ConfUtils: raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
      2018-08-20 11:54:30,274 INFO org.apache.ratis.conf.ConfUtils: raft.grpc.server.leader.outstanding.appends.max = 128 (default)
      2018-08-20 11:54:30,274 INFO org.apache.ratis.conf.ConfUtils: raft.server.rpc.request.timeout = 3000 ms (custom)
      2018-08-20 11:54:30,277 INFO org.apache.ratis.server.storage.RaftLogWorker: Rolling segment:b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858-RaftLogWorker index to:20489
      2018-08-20 11:54:30,733 INFO org.apache.ratis.server.impl.RaftServerImpl: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858-org.apache.ratis.server.impl.RoleInfo@465f9230: Withhold vote from candidate 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858 with term 15. State: leader=b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858, term=15, lastRpcElapsed=null
      2018-08-20 11:54:31,230 WARN org.apache.ratis.server.impl.LogAppender: b6aaaf2c-2cbf-498f-995c-09cb2bb97cf4_9858: Failed appendEntries to 2e240240-0fac-4f93-8aa8-fa8f74bf1810_9858:172.26.32.228:9858
      java.io.IOException: java.lang.IllegalStateException: Unexpected Index: previous is (t:14, i:20374) but entries[0].getIndex()=0
              at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:49)
              at org.apache.ratis.grpc.RaftGrpcUtil.tryUnwrapException(RaftGrpcUtil.java:81)
              at org.apache.ratis.grpc.RaftGrpcUtil.unwrapThrowable(RaftGrpcUtil.java:58)
              at org.apache.ratis.grpc.RaftGrpcUtil.warn(RaftGrpcUtil.java:129)
              at org.apache.ratis.grpc.server.GRpcLogAppender$AppendLogResponseHandler.onError(GRpcLogAppender.java:241)
              at org.apache.ratis.shaded.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:419)
              at org.apache.ratis.shaded.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
              at org.apache.ratis.shaded.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684)
              at org.apache.ratis.shaded.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
              at org.apache.ratis.shaded.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:391)
              at org.apache.ratis.shaded.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:475)
              at org.apache.ratis.shaded.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
              at org.apache.ratis.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:557)
              at org.apache.ratis.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:478)
      

        Attachments

        1. HDDS-433.001.patch
          3 kB
          Lokesh Jain

          Activity

            People

            • Assignee:
              ljain Lokesh Jain
              Reporter:
              ljain Lokesh Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: