Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.1.1
-
None
-
None
Description
We're seeing the following exception in our replication threads.
[2019-04-16 14:14:39,724] ERROR [ReplicaFetcher replicaId=15, leaderId=8, fetcherId=0] Error due to (kafka.server.ReplicaFetcherThread) kafka.common.KafkaException: Error processing data for partition testtopic-123 offset 9880379 at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:204) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:169) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:169) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:166) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:166) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:164) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) Caused by: org.apache.kafka.common.errors.TransactionCoordinatorFencedException: Invalid coordinator epoch: 27 (zombie), 31 (current)
While this is an issue itself the larger issue is that this exception kills the replication threads so no other partitions get replicated to this broker. That a single corrupt partition can affect the availability of multiple topics is a great concern to us.
Attachments
Issue Links
- is fixed by
-
KAFKA-9144 Early expiration of producer state can cause coordinator epoch to regress
- Resolved