Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We recently observed an issue in production that can apparently occur a small percentage of the time when a Kafka broker is stopped. We're using version 0.9.0.1 for all brokers and clients.
During a recent episode, 3 KafkaConsumer instances (out of approximately 100) ran into the following SchemaException within a few seconds of instructing the broker to shutdown.
2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'responses': Error reading array of size 2774863, only 62 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
The exception message was slightly different for one consumer,
Error reading field 'responses': Error reading array of size 2774863, only 260 bytes available
The exception was not caught and caused the Storm Executor thread to restart, so it's not clear if it would have been transient or fatal for the KafkaConsumer.
Here are the initial broker shutdown logs,
2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], shutting down 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-1-40], Shutting down 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-1-40], Stopped 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-1-40], Shutdown completed 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-3-30], Shutting down 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], Controlled shutdown succeeded 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on Broker 4], Shutting down 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on Broker 4], Shutdown completed
We've found one very similar reported occurrence,
http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E