Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-15905

Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.6.0
    • 3.8.0, 3.7.1
    • mirrormaker
    • None

    Description

      Executive summary: When the MirrorCheckpointTask restarts, it loses the state of checkpointsPerConsumerGroup, which limits offset translation to records mirrored after the latest restart.

      For example, if 1000 records are mirrored and the OffsetSyncs are read by MirrorCheckpointTask, the emitted checkpoints are cached, and translation can happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more records are mirrored, translation can happen at the ~1500th record, but no longer at the ~500th record.

      Context:

      Before KAFKA-13659, MM2 made translation decisions based on the incompletely-initialized OffsetSyncStore, and the checkpoint could appear to go backwards temporarily during restarts. To fix this, we forced the OffsetSyncStore to initialize completely before translation could take place, ensuring that the latest OffsetSync had been read, and thus providing the most accurate translation.

      Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to allow for translation of earlier offsets. This came with the caveat that the cache's sparseness allowed translations to go backwards permanently. To prevent this behavior, a cache of the latest Checkpoints was kept in the MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset translation remained restricted to the fully-initialized OffsetSyncStore.

      Effectively, the MirrorCheckpointTask ensures that it translates based on an OffsetSync emitted during it's lifetime, to ensure that no previous MirrorCheckpointTask emitted a later sync. If we can read the checkpoints emitted by previous generations of MirrorCheckpointTask, we can still ensure that checkpoints are monotonic, while allowing translation further back in history.

      Attachments

        Issue Links

          Activity

            People

              ecomar Edoardo Comar
              gharris1727 Greg Harris
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: