Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.6.0, 2.5.1
-
None
-
None
Description
We believe we are experiencing a bug when deploying Mirror Maker 2 in distributed mode in our environments. Replication does not work consistently after initial deployment and does not start working even after some time (24h+).
Environment & replication set-up
- 2 regions with a separate Kafka cluster (let's call them Region A and Region B)
- 3 instances of Mirror maker are deployed at the same time in Region B with the same configuration
- Replication is set up to be bi-directional (regionA->regionB & regionB->regionA)
Container Version
Observed with both confluentinc/cp-kafka:5.5.1 & confluentinc/cp-kafka:6.0.1
Mirror maker 2 configuration
clusters=regionA,regionB regionA.bootstrap.servers=regionA-kafka:9092 regionB.bootstrap.servers=regionB-kafka:9092 regionA->regionB.enabled=true regionA->regionB.topics=testTopic regionB->regionA.enabled=true regionB->regionA.topics=testTopic sync.topic.acls.enabled=false tasks.max=9
Observed behavior
- After deploying the 3 Mirror Maker instances (at the same time), replication for 1 or both mirrors does not work
- If we scale down to a single instance of mirror maker and wait for about 5 minutes (refresh.topics.interval.seconds?) replication starts working. After this scaling up to 3 correctly distributes the load between the deployed instances
Expected behavior
- Replication should work for all configured mirrors when running in distributed mode
- When starting multiple instances of Mirror Maker at the same time replication should work, 1 by 1 rollout should not be required
Additional details
- When replication is not working, we observe that in the internal config topics from Mirror Maker the partitions are not assigned to the tasks, eg task.assigned.partitions are not set at all under the properties object.
Workaround
- As a workaround, we start Mirror Maker instances 1 by 1 with some delay between each instance. This allows for the first instance to set-up the configuration in the internal topics correctly. Doing this seems to ensure that replication works as expected.
Attachments
Issue Links
- duplicates
-
KAFKA-9981 Running a dedicated mm2 cluster with more than one nodes,When the configuration is updated the task is not aware and will lose the update operation.
- Resolved
- relates to
-
KAFKA-12150 Consumer group refresh not working with clustered MM2 setup
- Resolved
-
KAFKA-12893 MM2 fails to replicate if starting two+ nodes same time
- Resolved
-
KAFKA-10586 Full support for distributed mode in dedicated MirrorMaker 2.0 clusters
- Resolved