When consumer uses plaintext and there is remaining data in consumer's buffer, consumer.poll() will read all data available from the socket buffer to consumer buffer. However, if consumer uses ssl and there is remaining data, consumer.poll() may only read 16 KB (the size of SslTransportLayer.appReadBuffer) from socket buffer. This will reduce efficient of consumer.poll() by asking user to call more poll() to get the same amount of data.
Furthermore, we observe that for users who naively sleep a constant time after each consumer.poll(), some partition will lag behind after they switch from plaintext to ssl. Here is the explanation why this can happen.
Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of these partitions are all different and consumer is consuming these 10 partitions. Let's also assume that socket read buffer size is large enough and consume sleeps 1 sec between consumer.poll(). 1 sec is long enough for consumer to receive the FetchResponse back from broker.
- When consumer uses plaintext, each consumer.poll() will read all data from the socket buffer and it means 1 MB data is read from each partition.
- When consumer uses ssl, each consumer.poll() is likely to find that there is some data available in the memory. In this case consumer only reads 16 KB data from other sockets, particularly the socket for the broker with the large partition. Then the throughput of the large partition will be limited to 16KB/sec.
Arguably user should not sleep 1 sec if its consumer is lagging behind. But on Kafka dev side it is nice to keep the previous behavior and optimize consumer.poll() to read as much data from socket as possible.