Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-15059

Exactly-once source tasks fail to start during pending rebalances

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.6.0
    • 3.6.0
    • connect, mirrormaker
    • None

    Description

      When asked to perform a round of zombie fencing, the distributed herder will reject the request if a rebalance is pending, which can happen if (among other things) a config for a new connector or a new set of task configs has been recently read from the config topic.

      Normally this can be alleviated with a simple task restart, which isn't great but isn't terrible.

      However, when running MirrorMaker 2 in dedicated mode, there is no API to restart failed tasks, and it can be more common to see this kind of failure on a fresh cluster because three connector configurations are written in rapid succession to the config topic.

       

      In order to provide a better experience for users of both vanilla Kafka Connect and dedicated MirrorMaker 2 clusters, we can retry (likely with the same exponential backoff introduced with KAFKA-14732) zombie fencing attempts that fail due to a pending rebalance.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ChrisEgerton Chris Egerton
            ChrisEgerton Chris Egerton
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment