Description
Kafka allows users to configure custom SSLEngine via the SslEngineFactory interface. There have been attempts to use an OpenSSL based SSLEngine using netty-handler over the JDK based implementation for performance reasons.
While trying to use a Netty/Openssl based SSLEngine, we observe that the server hangs while performing the TLS handshake. We observe the following logs
2024-02-26 01:40:00,117 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state NOT_INITIALIZED 2024-02-26 01:40:00,117 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [INFO ] Selector - [SocketServer listenerType=ZK_BROKER, nodeId=101] XX-Gaurav: calling prepare channelId 127.0.0.1:60045-127.0.0.1:60046-0 2024-02-26 01:40:00,117 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state NOT_INITIALIZED 2024-02-26 01:40:00,117 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,117 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] SSLHandshake NEED_UNWRAP channelId 127.0.0.1:60045-127.0.0.1:60046-0, appReadBuffer pos 0, netReadBuffer pos 0, netWriteBuffer pos 0 2024-02-26 01:40:00,117 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] SSLHandshake handshakeUnwrap channelId 127.0.0.1:60045-127.0.0.1:60046-0 doRead true 2024-02-26 01:40:00,118 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] SSLHandshake post handshakeUnwrap: channelId 127.0.0.1:60045-127.0.0.1:60046-0 handshakeStatus NEED_UNWRAP status BUFFER_UNDERFLOW read 0 2024-02-26 01:40:00,118 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] SSLHandshake post unwrap NEED_UNWRAP channelId 127.0.0.1:60045-127.0.0.1:60046-0, handshakeResult Status = BUFFER_UNDERFLOW HandshakeStatus = NEED_UNWRAP bytesConsumed = 0 bytesProduced = 0, appReadBuffer pos 0, netReadBuffer pos 0, netWriteBuffer pos 0 handshakeStatus NEED_UNWRAP 2024-02-26 01:40:00,118 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,118 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,118 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,118 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,118 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,119 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,119 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,120 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [INFO ] Selector - [SocketServer listenerType=ZK_BROKER, nodeId=101] XX-Gaurav: calling prepare channelId 127.0.0.1:60045-127.0.0.1:60046-0 2024-02-26 01:40:00,120 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,120 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] XX-Gaurav ready isReady false channelId 127.0.0.1:60045-127.0.0.1:60046-0 state HANDSHAKE 2024-02-26 01:40:00,120 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] SSLHandshake NEED_UNWRAP channelId 127.0.0.1:60045-127.0.0.1:60046-0, appReadBuffer pos 0, netReadBuffer pos 383, netWriteBuffer pos 0 2024-02-26 01:40:00,120 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] SSLHandshake handshakeUnwrap channelId 127.0.0.1:60045-127.0.0.1:60046-0 doRead true 2024-02-26 01:40:00,122 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] SSLHandshake post handshakeUnwrap: channelId 127.0.0.1:60045-127.0.0.1:60046-0 handshakeStatus NEED_UNWRAP status OK read 0 2024-02-26 01:40:00,122 data-plane-kafka-network-thread-101-ListenerName(TEST)-SASL_SSL-0 [TRACE] SslTransportLayer - [SslTransportLayer channelId=127.0.0.1:60045-127.0.0.1:60046-0 key=channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:60045 remote=/127.0.0.1:60046], selector=sun.nio.ch.KQueueSelectorImpl@18d201be, interestOps=1, readyOps=0] SSLHandshake post unwrap NEED_UNWRAP channelId 127.0.0.1:60045-127.0.0.1:60046-0, handshakeResult Status = OK HandshakeStatus = NEED_TASK bytesConsumed = 383 bytesProduced = 0, appReadBuffer pos 0, netReadBuffer pos 0, netWriteBuffer pos 0 handshakeStatus NEED_UNWRAP
Lines 21 and 22 are of interest here as we observe that:
1. handshakeStatus is NEED_UNWRAP and
2. netReadBuffer.position() == 0
The do-while loop in handshakeUnwrap() is exited because of (2) and as a result, the SSLEngine's state machine isn't handled properly.
We should therefore, avoid the optimisation and simplify the condition to be just while(cont).
Attachments
Issue Links
- links to