Description
Saw this failure in ITClient test:
noformat
00:34:51.362 DEBUG - New I/O worker #10 (AsyncKuduScanner.java:492) Can not open scanner
org.apache.kudu.client.NonRecoverableException: Tablet hasn't heard from leader, or there hasn't been a stable leader for: 0.757s secs, (max is 0.750s):
at org.apache.kudu.client.RpcProxy.dispatchTSError(RpcProxy.java:320)
at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:242)
at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59)
at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:131)
at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:127)
at org.apache.kudu.client.Connection.messageReceived(Connection.java:391)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.apache.kudu.client.Connection.handleUpstream(Connection.java:243)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
noformat
There was a new leader elected just before:
noformat
00:34:50.953 [INFO - cluster stderr printer] (MiniKuduCluster.java:526) I0925 00:34:50.953722 18257 catalog_manager.cc:3758] T 07e99535cba24d8e991829485de22275 P 060cf49269fb4f6f901696d741e69303 reported cstate change: term changed from 1 to 2, leader changed from acaedc4ec505489dbc853d8b32bfc147 (127.16.196.2) to 060cf49269fb4f6f901696d741e69303 (127.16.196.3). New cstate: current_term: 2 leader_uuid: "060cf49269fb4f6f901696d741e69303" committed_config { opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: "acaedc4ec505489dbc853d8b32bfc147" member_type: VOTER last_known_addr
health_report
{ overall_health: UNKNOWN }} peers { permanent_uuid: "060cf49269fb4f6f901696d741e69303" member_type: VOTER last_known_addr
{ host: "127.16.196.3" port: 37181 }health_report
{ overall_health: HEALTHY }} peers { permanent_uuid: "2f0ee942338c4fb7bce908b1c32f0bcb" member_type: VOTER last_known_addr
{ host: "127.16.196.1" port: 47865 }health_report
{ overall_health: UNKNOWN } } }
noformat
but it seems the 0.75 seconds the async scanner will wait for a leader partially overlapped with the election, and the remaining portion of the period wasn't enough to get an answer on who the new leader was.