Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10857

Mirror Maker 2 - replication not working when deploying multiple instances

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.6.0, 2.5.1
    • None
    • connect, mirrormaker
    • None

    Description

      We believe we are experiencing a bug when deploying Mirror Maker 2 in distributed mode in our environments. Replication does not work consistently after initial deployment and does not start working even after some time (24h+).

      Environment & replication set-up

      • 2 regions with a separate Kafka cluster (let's call them Region A and Region B)
      • 3 instances of Mirror maker are deployed at the same time in Region B with the same configuration
      • Replication is set up to be bi-directional (regionA->regionB & regionB->regionA)

      Container Version
      Observed with both confluentinc/cp-kafka:5.5.1 & confluentinc/cp-kafka:6.0.1

      Mirror maker 2 configuration

      clusters=regionA,regionB
      regionA.bootstrap.servers=regionA-kafka:9092
      regionB.bootstrap.servers=regionB-kafka:9092
      regionA->regionB.enabled=true
      regionA->regionB.topics=testTopic
      regionB->regionA.enabled=true
      regionB->regionA.topics=testTopic
      sync.topic.acls.enabled=false
      tasks.max=9
      

      Observed behavior

      • After deploying the 3 Mirror Maker instances (at the same time), replication for 1 or both mirrors does not work
        • If we scale down to a single instance of mirror maker and wait for about 5 minutes (refresh.topics.interval.seconds?) replication starts working. After this scaling up to 3 correctly distributes the load between the deployed instances

      Expected behavior

      • Replication should work for all configured mirrors when running in distributed mode
      • When starting multiple instances of Mirror Maker at the same time replication should work, 1 by 1 rollout should not be required

      Additional details

      • When replication is not working, we observe that in the internal config topics from Mirror Maker the partitions are not assigned to the tasks, eg task.assigned.partitions are not set at all under the properties object.

      Workaround

      • As a workaround, we start Mirror Maker instances 1 by 1 with some delay between each instance. This allows for the first instance to set-up the configuration in the internal topics correctly. Doing this seems to ensure that replication works as expected.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tirkits Athanasios Fanos
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: