Details
Description
I've seen this a few times in system tests:
[2017-06-10 19:47:38,434] ERROR Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient) java.lang.IllegalStateException: Batch has already been completed at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:157) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:576) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:555) at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:479) at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:75) at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:666) at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101) at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:454) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:446) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:206) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162) at java.lang.Thread.run(Thread.java:745) [2
I think this is probably caused by aborting in-flight batches after an error state. See the following log:
[2017-06-10 19:47:38,425] ERROR Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender) org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received an out of order sequence number [2017-06-10 19:47:38,425] DEBUG [TransactionalId my-first-transactional-id] Transition from state ABORTABLE_ERROR to ABORTING_TRANSACTION (org.apache.kafka.clients.producer.internals.TransactionManager) [2017-06-10 19:47:38,425] TRACE Produced messages to topic-partition output-topic-0 with base offset offset -1 and error: {}. (org.apache.kafka.clients.producer.internals.ProducerBatch) org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received an out of order sequence number [2017-06-10 19:47:38,425] DEBUG [TransactionalId my-first-transactional-id] Enqueuing transactional request (type=EndTxnRequest, transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, result=ABORT) (org.apache.kafka.clients.producer.internals.TransactionManager) [2017-06-10 19:47:38,426] TRACE [TransactionalId my-first-transactional-id] Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, result=ABORT) dequeued for sending (org.apache.kafka.clients.producer.internals.TransactionManager) [2017-06-10 19:47:38,426] DEBUG [TransactionalId my-first-transactional-id] Sending transactional request (type=EndTxnRequest, transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, result=ABORT) to node worker11:9092 (id: 3 rack: null) (org.apache.kafka.clients.producer.internals.Sender) [2017-06-10 19:47:38,434] TRACE Received produce response from node 2 with correlation id 514 (org.apache.kafka.clients.producer.internals.Sender) [2017-06-10 19:47:38,434] DEBUG Incremented sequence number for topic-partition output-topic-0 to 4500 (org.apache.kafka.clients.producer.internals.Sender) [2017-06-10 19:47:38,434] TRACE Produced messages to topic-partition output-topic-0 with base offset offset 7033 and error: null. (org.apache.kafka.clients.producer.internals.ProducerBatch) [2017-06-10 19:47:38,434] ERROR Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient) java.lang.IllegalStateException: Batch has already been completed
A simple solution is to add a separate flag to indicate that the batch has been aborted. We can check it when the response returns and skip the callback.
Attachments
Issue Links
- links to