Description
Saw this recently in a system test failure:
[2017-09-21 05:04:52,033] ERROR [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender) org.apache.kafka.common.KafkaException: The client hasn't received acknowledgment for some previously sent messages and can no longer retry them. It isn't safe to continue. at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164) at java.lang.Thread.run(Thread.java:745) [2017-09-21 05:04:52,033] TRACE Aborting batch for partition output-topic-2 (org.apache.kafka.clients.producer.internals.ProducerBatch) org.apache.kafka.common.KafkaException: The client hasn't received acknowledgment for some previously sent messages and can no longer retry them. It isn't safe to continue. at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164) at java.lang.Thread.run(Thread.java:745) [2017-09-21 05:04:52,134] TRACE [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Not sending transactional request (type=EndTxnRequest, transactionalId=my-second-transactional-id, producerId=1000, producerEpoch=0, result=COMMIT) because we are in an error state (org.apache.kafka.clients.producer.internals.TransactionManager) [2017-09-21 05:04:52,134] INFO [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2017-09-21 05:04:52,134] DEBUG [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Beginning shutdown of Kafka producer I/O thread, sending remaining records. (org.apache.kafka.clients.producer.internals.Sender) [2017-09-21 05:04:52,360] TRACE [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Received produce response from node 1 with correlation id 245 (org.apache.kafka.clients.producer.internals.Sender) [2017-09-21 05:04:52,360] DEBUG [Producer clientId=producer-1, transactionalId=my-second-transactional-id] ProducerId: 1000; Set last ack'd sequence number for topic-partition output-topic-2 to 136 (org.apache.kafka.clients.producer.internals.Sender) [2017-09-21 05:04:52,360] TRACE Successfully produced messages to output-topic-2 with base offset 387. (org.apache.kafka.clients.producer.internals.ProducerBatch) [2017-09-21 05:04:52,360] DEBUG ProduceResponse returned for output-topic-2 after batch had already been aborted. (org.apache.kafka.clients.producer.internals.ProducerBatch) [2017-09-21 05:04:52,360] ERROR [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient) java.lang.IllegalStateException: Remove from the incomplete set failed. This should be impossible. at org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44) at org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:612) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:585) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:561) at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:475) at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:685) at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101) at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:481) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:473) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:225) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:177) at java.lang.Thread.run(Thread.java:745)
Although we allow a batch to be aborted before it returns, we are not careful about preventing a second call to deallocate() which causes this error.
Attachments
Issue Links
- links to