Description
RaftClient does not change leaderId and does not refresh the streamObserver for the old leaderId in case of TimeoutIOException. For AlreadyClosedException also the raft client should retry to a different leader.
For TimeoutIOException leader change is good to have if the server has false notion of being the leader. Similarly for AlreadyClosedException I guess it will be good to try sending request to a different server since the existing stream might be closed by server.
19/09/25 18:09:42 DEBUG client.RaftClient: client-E5E9D444D128: suggested new leader: null. Failed RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142, cid=34, seq=11*, RW, org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 with {} 19/09/25 18:09:42 DEBUG client.RaftClient: client-E5E9D444D128: suggested new leader: null. Failed RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142, cid=34, seq=11*, RW, org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 with {}org.apache.ratis.protocol.TimeoutIOException: Request timeout 3000ms: RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142, cid=34, seq=11*, RW, org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:319) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:323) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:318) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:312) at org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113) at org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133) at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/09/25 18:09:42 DEBUG client.RaftClient: client-E5E9D444D128: oldLeader=46001d60-9ca0-4042-a521-69cb38b96882, curLeader=46001d60-9ca0-4042-a521-69cb38b96882, newLeader=394810ca-48fe-47bd-a0c8-eeaaec380bc319/09/25 18:09:42 DEBUG impl.OrderedAsync: schedule* attempt #171 with policy RetryLimited(maxAttempts=180, sleepTime=1000ms) for RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142, cid=34, seq=11*, RW, org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 19/09/25 18:09:43 DEBUG impl.OrderedAsync: client-E5E9D444D128: send* RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142, cid=34, seq=11*, RW, org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 19/09/25 18:09:43 DEBUG client.RaftClient: client-E5E9D444D128: suggested new leader: null. Failed RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142, cid=34, seq=11*, RW, org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 with {}java.io.IOException: java.lang.IllegalStateException: Entry already exists for key 11 in map SlidingWindow$Server:257-OrderedRequestStreamObserver257:requests at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:106) at org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at org.apache.ratis.grpc.GrpcUtil.unwrapIOException(GrpcUtil.java:128) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:283) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.IllegalStateException: Entry already exists for key 11 in map SlidingWindow$Server:257-OrderedRequestStreamObserver257:requests at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:222) at org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:104) ... 22 more
Attachments
Attachments
Issue Links
- relates to
-
RATIS-601 Fix NotLeaderException handling
- Resolved