Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12664

Mirrormaker 2.0 infinite rebalance loop when dealing with more than 2 clusters in standalone mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • 2.5.0, 2.4.1, 2.6.0, 2.7.0
    • None
    • mirrormaker
    • None

    Description

      Hi Folks, I came across this issue when trying to aggregate data from two separate data centres into one data centre.

      In the configuration below, you can see I am trying to replicate a topic from dc1 (named test_topic_dc1) to dc3 as well as replicate a topic from dc2 (test_topic_dc2) to dc3.

      However, when I try to replicate both topics from those datacenters at the same time I notice that connect gets stuck in a rebalance loop (see attachment for logs)
      connect.log.tar.gz

      excerpt of connect.log

      2021-04-13 17:03:06,360] INFO [Worker clientId=connect-3, groupId=mm2-dc2] Successfully synced group in generation Generation{generationId=347, memberId='connect-3-c59342c3-ca62-41cc-964c-41a0f98351c0', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:756)[2021-04-13 17:03:06,360] INFO [Worker clientId=connect-4, groupId=mm2-dc2] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:225)[2021-04-13 17:03:06,362] INFO [Worker clientId=connect-4, groupId=mm2-dc2] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:540)[2021-04-13 17:03:06,368] INFO [Worker clientId=connect-2, groupId=mm2-dc3] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:225)[2021-04-13 17:03:06,369] INFO [Worker clientId=connect-2, groupId=mm2-dc3] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:540)[2021-04-13 17:03:06,370] INFO [Worker clientId=connect-3, groupId=mm2-dc2] Joined group at generation 347 with protocol version 2 and got assignment: Assignment{error=1, leader='connect-3-c59342c3-ca62-41cc-964c-41a0f98351c0', leaderUrl='NOTUSED/dc1', offset=13, connectorIds=[MirrorSourceConnector], taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1688)
      

      To replicate the issue here is what I used:

      mm2.properties

      clusters = dc1, dc2, dc3
      dc1.bootstrap.servers = kafka-dc1:19092
      dc2.bootstrap.servers = kafka-dc2:19093
      dc3.bootstrap.servers = kafka-dc3:19094
      dc1.group.id=mm2-dc1
      dc2.group.id=mm2-dc2
      dc3.group.id=mm2-dc3
      replication.factor=1
      checkpoints.topic.replication.factor=1
      heartbeats.topic.replication.factor=1
      offset-syncs.topic.replication.factor=1
      offset.storage.replication.factor=1
      status.storage.replication.factor=1
      config.storage.replication.factor=1
      dc1->dc3.enabled = true
      dc1->dc3.topics = test_topic_dc1
      dc2->dc3.enabled = true
      dc2->dc3.topics = test_topic_dc2
      dc3->dc2 = falsedc3->dc1 = false
      

      This docker-compose-multi.yml file to create local kafka clusters (dc1,dc2,dc3)
      (I set docker to use 6 cpus, 8gb mem, swap 2gb)

      I then ran an interactive shell to run mirror maker within the same docker-compose network (change network to match yours)

      docker run --network kafka-examples_default -it wurstmeister/kafka:latest bash
      
      # Upload mm2 properties on server
      
      /opt/kafka/bin/connect-mirror-maker.sh mm2.properties

      Kafkacat commands to produce to dc1, dc2

      kafkacat -b localhost:9092 -t test_topic_dc1 -P
      Hello World from DC1!
      kafkacat -b localhost:9093 -t test_topic_dc2 -P
      Hello World from DC2

      I then tried to remove one of the datacenters to confirm if it was a configuration problem, however mirror maker ran successfully with the below configuration

      mm2.properties

      clusters = dc2, dc3
      dc2.bootstrap.servers = kafka-dc2:19093
      dc3.bootstrap.servers = kafka-dc3:19094
      dc2.group.id=mm2-dc2
      dc3.group.id=mm2-dc3
      replication.factor=1
      checkpoints.topic.replication.factor=1
      heartbeats.topic.replication.factor=1
      offset-syncs.topic.replication.factor=1
      offset.storage.replication.factor=1
      status.storage.replication.factor=1
      config.storage.replication.factor=1
      dc2->dc3.enabled = true
      dc2->dc3.topics = test_topic_dc2
      

      Any help would be appreciated!

      Attachments

        1. connect.log.tar.gz
          844 kB
          Edward Vaisman
        2. docker-compose-multi.yml
          4 kB
          Edward Vaisman
        3. mm2.properties
          0.6 kB
          Edward Vaisman

        Activity

          People

            durban Daniel Urban
            eddyv Edward Vaisman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: