Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
0.11.0.0
-
None
Description
I observed the following in the broker logs:
[2017-06-01 04:10:36,664] ERROR [Replica Manager on Broker 1]: Error processing append operation on partition __transaction_state-37 (kafka.server.ReplicaManager) [2017-06-01 04:10:36,667] ERROR [TxnMarkerSenderThread-1]: Error due to (kafka.common.InterBrokerSendThread) java.lang.StackOverflowError at java.security.AccessController.doPrivileged(Native Method) at java.io.PrintWriter.<init>(PrintWriter.java:116) at java.io.PrintWriter.<init>(PrintWriter.java:100) at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:58) at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87) at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313) at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:369) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.error(Category.java:322) at kafka.utils.Logging$class.error(Logging.scala:105) at kafka.server.ReplicaManager.error(ReplicaManager.scala:122) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:557) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:505) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:505) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:346) at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply$mcV$sp(TransactionStateManager.scala:589) at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply(TransactionStateManager.scala:570) at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply(TransactionStateManager.scala:570) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:219) at kafka.coordinator.transaction.TransactionStateManager.appendTransactionToLog(TransactionStateManager.scala:564) at kafka.coordinator.transaction.TransactionMarkerChannelManager.kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1(TransactionMarkerChannelManager.scala:225) at kafka.coordinator.transaction.TransactionMarkerChannelManager$$anonfun$kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1$4.apply(TransactionMarkerChannelManager.scala:225) at kafka.coordinator.transaction.TransactionMarkerChannelManager$$anonfun$kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1$4.apply(TransactionMarkerChannelManager.scala:225) at kafka.coordinator.transaction.TransactionStateManager.kafka$coordinator$transaction$TransactionStateManager$$updateCacheCallback$1(TransactionStateManager.scala:561) at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1$$anonfun$apply$mcV$sp$4.apply(TransactionStateManager.scala:595) at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1$$anonfun$apply$mcV$sp$4.apply(TransactionStateManager.scala:595) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:373)
The upshot of this ticket is that the commit of a transaction will simply get stuck, because if the markers can't be written, the transaction will stay in a `PREPARE_COMMIT` state, and the producer retries will constantly get `CONCURRENT_TRANSACTION` errors.