Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5957

Producer IllegalStateException due to second deallocate after aborting a batch

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.0.0
    • producer
    • None

    Description

      Saw this recently in a system test failure:

      [2017-09-21 05:04:52,033] ERROR [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender)
      org.apache.kafka.common.KafkaException: The client hasn't received acknowledgment for some previously sent messages and can no longer retry them. It isn't safe to continue.
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164)
              at java.lang.Thread.run(Thread.java:745)
      [2017-09-21 05:04:52,033] TRACE Aborting batch for partition output-topic-2 (org.apache.kafka.clients.producer.internals.ProducerBatch)
      org.apache.kafka.common.KafkaException: The client hasn't received acknowledgment for some previously sent messages and can no longer retry them. It isn't safe to continue.
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164)
              at java.lang.Thread.run(Thread.java:745)
      [2017-09-21 05:04:52,134] TRACE [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Not sending transactional request (type=EndTxnRequest, transactionalId=my-second-transactional-id, producerId=1000, producerEpoch=0, result=COMMIT) because we are in an error state (org.apache.kafka.clients.producer.internals.TransactionManager)
      [2017-09-21 05:04:52,134] INFO [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
      [2017-09-21 05:04:52,134] DEBUG [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Beginning shutdown of Kafka producer I/O thread, sending remaining records. (org.apache.kafka.clients.producer.internals.Sender)
      [2017-09-21 05:04:52,360] TRACE [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Received produce response from node 1 with correlation id 245 (org.apache.kafka.clients.producer.internals.Sender)
      [2017-09-21 05:04:52,360] DEBUG [Producer clientId=producer-1, transactionalId=my-second-transactional-id] ProducerId: 1000; Set last ack'd sequence number for topic-partition output-topic-2 to 136 (org.apache.kafka.clients.producer.internals.Sender)
      [2017-09-21 05:04:52,360] TRACE Successfully produced messages to output-topic-2 with base offset 387. (org.apache.kafka.clients.producer.internals.ProducerBatch)
      [2017-09-21 05:04:52,360] DEBUG ProduceResponse returned for output-topic-2 after batch had already been aborted. (org.apache.kafka.clients.producer.internals.ProducerBatch)
      [2017-09-21 05:04:52,360] ERROR [Producer clientId=producer-1, transactionalId=my-second-transactional-id] Uncaught error in request completion: (org.apache.kafka.clients.NetworkClient)
      java.lang.IllegalStateException: Remove from the incomplete set failed. This should be impossible.
              at org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)
              at org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:612)
              at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:585)
              at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:561)
              at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:475)
              at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
              at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:685)
              at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
              at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:481)
              at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:473)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:225)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:177)
              at java.lang.Thread.run(Thread.java:745)
      

      Although we allow a batch to be aborted before it returns, we are not careful about preventing a second call to deallocate() which causes this error.

      Attachments

        Issue Links

          Activity

            People

              hachikuji Jason Gustafson
              hachikuji Jason Gustafson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: