[KAFKA-15059] Exactly-once source tasks fail to start during pending rebalances - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 3.6.0
Fix Version/s: 3.6.0
Component/s: connect, mirrormaker
Labels:
None

Description

When asked to perform a round of zombie fencing, the distributed herder will reject the request if a rebalance is pending, which can happen if (among other things) a config for a new connector or a new set of task configs has been recently read from the config topic.

Normally this can be alleviated with a simple task restart, which isn't great but isn't terrible.

However, when running MirrorMaker 2 in dedicated mode, there is no API to restart failed tasks, and it can be more common to see this kind of failure on a fresh cluster because three connector configurations are written in rapid succession to the config topic.

In order to provide a better experience for users of both vanilla Kafka Connect and dedicated MirrorMaker 2 clusters, we can retry (likely with the same exponential backoff introduced with ~~KAFKA-14732~~) zombie fencing attempts that fail due to a pending rebalance.

Attachments

Issue Links

causes

KAFKA-14718 Flaky DedicatedMirrorIntegrationTest test suite

Resolved

links to

GitHub Pull Request #13819

Activity

People

Assignee:: Chris Egerton

Reporter:: Chris Egerton

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Jun/23 21:38

Updated:: 11/Jul/23 14:23

Resolved:: 21/Jun/23 08:58