Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0.0
    • Component/s: clients, core, producer
    • Labels:
      None

      Description

      I've seen this a few times in system tests:

      [2017-06-10 19:47:38,434] ERROR Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient)
      java.lang.IllegalStateException: Batch has already been completed
              at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:157)
              at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:576)
              at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:555)
              at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:479)
              at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:75)
              at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:666)
              at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
              at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:454)
              at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:446)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:206)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
              at java.lang.Thread.run(Thread.java:745)
      [2
      

      I think this is probably caused by aborting in-flight batches after an error state. See the following log:

      [2017-06-10 19:47:38,425] ERROR Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender)
      org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received an out of order sequence number
      [2017-06-10 19:47:38,425] DEBUG [TransactionalId my-first-transactional-id] Transition from state ABORTABLE_ERROR to ABORTING_TRANSACTION (org.apache.kafka.clients.producer.internals.TransactionManager)
      [2017-06-10 19:47:38,425] TRACE Produced messages to topic-partition output-topic-0 with base offset offset -1 and error: {}. (org.apache.kafka.clients.producer.internals.ProducerBatch)
      org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received an out of order sequence number
      [2017-06-10 19:47:38,425] DEBUG [TransactionalId my-first-transactional-id] Enqueuing transactional request (type=EndTxnRequest, transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, result=ABORT) (org.apache.kafka.clients.producer.internals.TransactionManager)
      [2017-06-10 19:47:38,426] TRACE [TransactionalId my-first-transactional-id] Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, result=ABORT) dequeued for sending (org.apache.kafka.clients.producer.internals.TransactionManager)
      [2017-06-10 19:47:38,426] DEBUG [TransactionalId my-first-transactional-id] Sending transactional request (type=EndTxnRequest, transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, result=ABORT) to node worker11:9092 (id: 3 rack: null) (org.apache.kafka.clients.producer.internals.Sender)
      [2017-06-10 19:47:38,434] TRACE Received produce response from node 2 with correlation id 514 (org.apache.kafka.clients.producer.internals.Sender)
      [2017-06-10 19:47:38,434] DEBUG Incremented sequence number for topic-partition output-topic-0 to 4500 (org.apache.kafka.clients.producer.internals.Sender)
      [2017-06-10 19:47:38,434] TRACE Produced messages to topic-partition output-topic-0 with base offset offset 7033 and error: null. (org.apache.kafka.clients.producer.internals.ProducerBatch)
      [2017-06-10 19:47:38,434] ERROR Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient)
      java.lang.IllegalStateException: Batch has already been completed
      

      A simple solution is to add a separate flag to indicate that the batch has been aborted. We can check it when the response returns and skip the callback.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hachikuji Jason Gustafson
                Reporter:
                hachikuji Jason Gustafson
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: