Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-14830

Illegal state error in transactional producer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Critical
    • Resolution: Unresolved
    • 3.1.2
    • 4.0.0
    • clients, producer

    Description

      We have seen the following illegal state error in the producer:

      [Producer clientId=client-id2, transactionalId=transactional-id] Transiting to abortable error state due to org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topic-0:120027 ms has passed since batch creation
      [Producer clientId=client-id2, transactionalId=transactional-id] Transiting to abortable error state due to org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topic-1:120026 ms has passed since batch creation
      [Producer clientId=client-id2, transactionalId=transactional-id] Aborting incomplete transaction
      [Producer clientId=client-id2, transactionalId=transactional-id] Invoking InitProducerId with current producer ID and epoch ProducerIdAndEpoch(producerId=191799, epoch=0) in order to bump the epoch
      [Producer clientId=client-id2, transactionalId=transactional-id] ProducerId set to 191799 with epoch 1
      [Producer clientId=client-id2, transactionalId=transactional-id] Transiting to abortable error state due to org.apache.kafka.common.errors.NetworkException: Disconnected from node 4
      [Producer clientId=client-id2, transactionalId=transactional-id] Transiting to abortable error state due to org.apache.kafka.common.errors.TimeoutException: The request timed out.
      [Producer clientId=client-id2, transactionalId=transactional-id] Uncaught error in request completion:
      java.lang.IllegalStateException: TransactionalId transactional-id: Invalid transition attempted from state READY to state ABORTABLE_ERROR
              at org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:1089)
              at org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:508)
              at org.apache.kafka.clients.producer.internals.TransactionManager.maybeTransitionToErrorState(TransactionManager.java:734)
              at org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:739)
              at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:753)
              at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:743)
              at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:695)
              at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:634)
              at org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:575)
              at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
              at org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:562)
              at java.base/java.lang.Iterable.forEach(Iterable.java:75)
              at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:562)
              at org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836)
              at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
              at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:583)
              at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:575)
              at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
              at java.base/java.lang.Thread.run(Thread.java:829)
       

      The producer hits timeouts which cause it to abort an active transaction. After aborting, the producer bumps its epoch, which transitions it back to the `READY` state. Following this, there are two errors for inflight requests, which cause an illegal state transition to `ABORTABLE_ERROR`. But how could the transaction ABORT complete if there were still inflight requests? 

      Attachments

        Issue Links

          Activity

            People

              kirktrue Kirk True
              hachikuji Jason Gustafson
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: