Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5957

Producer IllegalStateException due to second deallocate after aborting a batch

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: producer
    • Labels:
      None

      Description

      Saw this recently in a system test failure:

      [2017-09-21 05:04:52,033] ERROR [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender)
      org.apache.kafka.common.KafkaException: The client hasn't received acknowledgment for some previously sent messages and can no longer retry them. It isn't safe to continue.
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164)
              at java.lang.Thread.run(Thread.java:745)
      [2017-09-21 05:04:52,033] TRACE Aborting batch for partition output-topic-2 (org.apache.kafka.clients.producer.internals.ProducerBatch)
      org.apache.kafka.common.KafkaException: The client hasn't received acknowledgment for some previously sent messages and can no longer retry them. It isn't safe to continue.
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164)
              at java.lang.Thread.run(Thread.java:745)
      [2017-09-21 05:04:52,134] TRACE [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Not sending transactional request (type=EndTxnRequest, transactionalId=my-second-transactional-id, producerId=1000, producerEpoch=0, result=COMMIT) because we are in an error state (org.apache.kafka.clients.producer.internals.TransactionManager)
      [2017-09-21 05:04:52,134] INFO [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
      [2017-09-21 05:04:52,134] DEBUG [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Beginning shutdown of Kafka producer I/O thread, sending remaining records. (org.apache.kafka.clients.producer.internals.Sender)
      [2017-09-21 05:04:52,360] TRACE [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Received produce response from node 1 with correlation id 245 (org.apache.kafka.clients.producer.internals.Sender)
      [2017-09-21 05:04:52,360] DEBUG [Producer clientId=producer-1, transactionalId=my-second-transactional-id] ProducerId: 1000; Set last ack'd sequence number for topic-partition output-topic-2 to 136 (org.apache.kafka.clients.producer.internals.Sender)
      [2017-09-21 05:04:52,360] TRACE Successfully produced messages to output-topic-2 with base offset 387. (org.apache.kafka.clients.producer.internals.ProducerBatch)
      [2017-09-21 05:04:52,360] DEBUG ProduceResponse returned for output-topic-2 after batch had already been aborted. (org.apache.kafka.clients.producer.internals.ProducerBatch)
      [2017-09-21 05:04:52,360] ERROR [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient)
      java.lang.IllegalStateException: Remove from the incomplete set failed. This should be impossible.
              at org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)
              at org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:612)
              at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:585)
              at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:561)
              at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:475)
              at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
              at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:685)
              at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
              at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:481)
              at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:473)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:225)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:177)
              at java.lang.Thread.run(Thread.java:745)
      

      Although we allow a batch to be aborted before it returns, we are not careful about preventing a second call to deallocate() which causes this error.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hachikuji Jason Gustafson
                Reporter:
                hachikuji Jason Gustafson
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: