Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9849

Fix issue with worker.unsync.backoff.ms creating zombie workers when incremental cooperative rebalancing is used



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.1, 2.5.0, 2.4.1
    • 2.3.2, 2.6.0, 2.4.2, 2.5.1
    • connect
    • None


      worker.unsync.backoff.ms is a property that was introduced a while ago when eager (stop-the-world) rebalancing was the only option for Connect workers. The goal of this property is to avoid triggering consecutive rebalances when a worker fails to catch up with the config topic in time and therefore voluntarily leaves the group with a LeaveGroupRequest.

      With incremental cooperative rebalancing this backoff (worker.unsync.backoff.ms) }}that has a default value equal to the default value of {{scheduled.rebalance.max.delay.ms (5min) might end up turning a worker into a zombie worker that retains its tasks but stays out of the group. This worker, by backing off from rebalancing, leaves not option to the leader of the group but to reassign the missing tasks that were thought as lost to other members of the group if the worker that backs off does not return in time before scheduled.rebalance.max.delay.ms expires. 

      Clearly, worker.unsync.backoff.ms was introduced to avoid rebalancing storms under the presence of intermittent connectivity issues with eager rebalancing. However when incremental cooperative rebalancing is used this property might inadvertently make workers operate as zombie workers that keep running tasks while they are out of the group.

      Of course, a good tradeoff needs to be made between avoiding to make the protocol too eager again and at the same time avoiding to turn workers into zombies when connection is not lost for too long from the broker coordinator.


        Issue Links



              kkonstantine Konstantine Karantasis
              kkonstantine Konstantine Karantasis
              0 Vote for this issue
              5 Start watching this issue