Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-15059

Exactly-once source tasks fail to start during pending rebalances

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.6.0
    • 3.6.0
    • connect, mirrormaker
    • None

    Description

      When asked to perform a round of zombie fencing, the distributed herder will reject the request if a rebalance is pending, which can happen if (among other things) a config for a new connector or a new set of task configs has been recently read from the config topic.

      Normally this can be alleviated with a simple task restart, which isn't great but isn't terrible.

      However, when running MirrorMaker 2 in dedicated mode, there is no API to restart failed tasks, and it can be more common to see this kind of failure on a fresh cluster because three connector configurations are written in rapid succession to the config topic.

       

      In order to provide a better experience for users of both vanilla Kafka Connect and dedicated MirrorMaker 2 clusters, we can retry (likely with the same exponential backoff introduced with KAFKA-14732) zombie fencing attempts that fail due to a pending rebalance.

      Attachments

        Issue Links

          Activity

            People

              ChrisEgerton Chris Egerton
              ChrisEgerton Chris Egerton
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: