Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
There are two ways to implement "mirroring" (i.e. replicating a topic from one cluster to another):
1. Do a simple read from the source and write to the destination with no attempt to maintain the same partitioning or offsets in the destination cluster. In this case the destination cluster may have a different number of partitions, and you can even read from many clusters to create a merged cluster. This flexibility is nice. The downside is that since the partitioning and offsets are not the same a consumer of the source cluster has no equivalent position in the destination cluster. This is the style of mirroring we have implemented in the mirror-maker tool and use for datacenter replication today.
2. The second style of replication only would allow creating an exact replica of a source cluster (i.e. all partitions and offsets exactly the same). The nice thing about this is that the offsets and partitions would match exactly. The downside is that it is not possible to merge multiple source clusters this way or have different partitioning. We do not currently support this in mirror maker.
It would be nice to implement the second style as an option in mirror maker as having an exact replica would be a nice option to have in the case where you are replicating a single cluster only.
There are some nuances: In order to maintain the exact offsets it is important to guarantee that the producer never resends a message or loses a message. As a result it would be important to have only a single producer for each destination partition, and check the last produced message on startup (using the getOffsets api) so that in the case of a hard crash messages that are re-consumed are not re-emitted.