Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
We noticed this exception in the logs:
java.lang.IllegalStateException: TransactionalId foo completing transaction state transition while it does not have a pending state
at kafka.coordinator.transaction.TransactionMetadata.$anonfun$completeTransitionTo$1(TransactionMetadata.scala:357)
at kafka.coordinator.transaction.TransactionMetadata.completeTransitionTo(TransactionMetadata.scala:353)
at kafka.coordinator.transaction.TransactionStateManager.$anonfun$appendTransactionToLog$3(TransactionStateManager.scala:595)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.coordinator.transaction.TransactionMetadata.inLock(TransactionMetadata.scala:188)
at kafka.coordinator.transaction.TransactionStateManager.$anonfun$appendTransactionToLog$15$adapted(TransactionStateManager.scala:587)
at kafka.server.DelayedProduce.onComplete(DelayedProduce.scala:126)
at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:70)
at kafka.server.DelayedProduce.tryComplete(DelayedProduce.scala:107)
at kafka.server.DelayedOperation.maybeTryComplete(DelayedOperation.scala:121)
at kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched(DelayedOperation.scala:378)
at kafka.server.DelayedOperationPurgatory.checkAndComplete(DelayedOperation.scala:280)
at kafka.cluster.DelayedOperations.checkAndCompleteAll(Partition.scala:122)
at kafka.cluster.Partition.tryCompleteDelayedRequests(Partition.scala:1023)
at kafka.cluster.Partition.updateFollowerFetchState(Partition.scala:740)
After inspection, we found that there were two CompleteCommit entries in the transaction state log which explains the failed transition. Indeed the logic for writing the CompleteCommit message does seem prone to race conditions.
Attachments
Issue Links
- links to