The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop.
To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some.
|Field||Original Value||New Value|
|Summary||Replica fetcher threads do not implement any back-off behavior||Replica fetcher thread does not implement any back-off behavior|
|Assignee||Neha Narkhede [ nehanarkhede ]||nicu marasoiu [ nmarasoi ]|
|Fix Version/s||0.8.3 [ 12328745 ]|
|Assignee||nicu marasoiu [ nmarasoi ]||Sriharsha Chintalapani [ sriharsha ]|
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Comment||[ Here is my reasoning. Say you are an operations person. And, in the next release we tell folks about the KIP to learn and understand changes that affect them (yada yada language for the release). And something like this isn't in there. We are changing the behavior of an existing config and removing another. It makes the communication of behavior incongruent for the changes of a release. So, while I agree we don't "need it" technically but for this consistency reason is why I even brought it up. I was just looking at it from the release perspective for what ops folks are going to be looking at when we get there. ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Workflow||no-reopen-closed, patch-avail [ 12863372 ]||Apache Kafka Workflow [ 13051747 ]|
|Workflow||Apache Kafka Workflow [ 13051747 ]||no-reopen-closed, patch-avail [ 13053988 ]|