Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8242

Exception in ReplicaFetcher blocks replication of all other partitions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.1
    • None
    • replication
    • None

    Description

      We're seeing the following exception in our replication threads. 

      [2019-04-16 14:14:39,724] ERROR [ReplicaFetcher replicaId=15, leaderId=8, fetcherId=0] Error due to (kafka.server.ReplicaFetcherThread)
      kafka.common.KafkaException: Error processing data for partition testtopic-123 offset 9880379
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:204)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:169)
      at scala.Option.foreach(Option.scala:257)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:169)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:166)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:166)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
      at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:164)
      at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
      at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
      Caused by: org.apache.kafka.common.errors.TransactionCoordinatorFencedException: Invalid coordinator epoch: 27 (zombie), 31 (current)
      

      While this is an issue itself the larger issue is that this exception kills the replication threads so no other partitions get replicated to this broker. That a single corrupt partition can affect the availability of multiple topics is a great concern to us.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nevins-b Nevins Bartolomeo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: