Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12659

Mirrormaker 2 - seeking to wrong offsets on restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.0
    • None
    • mirrormaker
    • None
    • Docker container based on openjdk11:alpine-slim , running on Amazon ECS

    Description

      We are running a dedicated mirror maker 2 cluster with three tasks, and have been trialing it for a few weeks on a single topic. It's been going fine, so we attempted to add a second topic, changing the MM2 config file from 

      topics = sports

      to 

      topics = sports|translations 

       

      We noticed the following day that the replication of the new topic was not working, and reading online it seems others have had similar issues, perhaps related to the config stored in the internal mm2-configs topic not refreshing from the file, so following  recommendations in that thread we stopped the tasks for 10 minutes, and eventually it started replicating.

      However we also noticed later that MM2 had started re-replicating about 5 million records from earlier that day (from the original topic) which was concerning. A few hours later I restarted the MM2 tasks and the same thing happened, it started re-replicating the same old messages.

      Looking into the mm2-offsets-{source}.internal topic I could see that the records which track offsets switched partitions, for example the records for sports-7 topic-partition went from being written to partition 5 (in mm2-offsets) to partition 8. The same occurred for other partitions (most but not all)

      Following the task restarts in the MM2 logs I can see that MM2 is always Seeking to offset 42741034 for sports-7, this value matches the oldest offset record on mm2-offsets partition 5, so it looks like MM2 is ignoring the more recent offset records on partition 8 and so not seeking to the correct latest offsets.

      And this also appears to affect compaction of the offsets internal topic, as while the older records on partition 8 for the sports-7 key are being cleaned up, the even older records for that same key on partition 5 are not.

      I cant be certain that introducing the second topic into MM2 config was the trigger for that partitioning behaviour change, I am not sure why it would unless adding more topics to the topic replication list caused MM2 to automatically scale the number of partitions on the mm2-offsets-{source}.internal topic, which I guess might affect partitioning behaviour. It was the only noteworthy thing that we consciously changed within the same rough timeframe however.

      Attached is a screenshot to try and help illustrate the issue

       

       

       

       

       

       

      Attachments

        1. partitions.png
          355 kB
          stuart

        Activity

          People

            Unassigned Unassigned
            stuparty stuart
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: