Details
-
Bug
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
1.11.0
-
None
Description
When deploying a session cluster on Kubernetes via bin/kubernetes-session.sh, I see the following stack trace in the master logs:
2020-06-04 01:17:52,068 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - Unhandled exception java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_252] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_252] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_252] at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_252] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) ~[?:1.8.0_252] at org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:247) ~[flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1140) ~[flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347) ~[flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) [flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697) [flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632) [flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549) [flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511) [flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) [flink-dist_2.11-1.11.0.jar:1.11.0] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [flink-dist_2.11-1.11.0.jar:1.11.0] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
I am not entirely sure whether this is a configuration problem or a K8s service which does some liveness checks? The consequence is that the JM logs are being cluttered with these stack traces.
Most likely this is not caused by Flink but some K8s behavior. The question is whether we can do something about it if it occurs often.