Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Consider the following scenario:

      Three replicas: A, B, and C. In epoch=1, replica A is the leader and writes up to offset 10. The leader then fails with the high watermark at offset 8. Replica B had caught up to offset 10 while replica C was at offset 8. Suppose that C is elected with epoch=2 and immediately writes records up to offset 10. However, it also fails before these records become committed and replica B gets elected and writes records
      up to offset 12. The epoch cache on each replica will look like the following:

      Replica A:
      (epoch=1, start_offset=0)

      Replica B:
      (epoch=1, start_offset=0)
      (epoch=3, start_offset=10)

      Replica C:
      (epoch=1, start_offset=0)
      (epoch=2, start_offset=8)

      Suppose C comes back online. It will attempt to fetch at offset=10 with last_fetched_epoch=3. The leader B will detect log divergence and will return truncation_offset=10. Replica C will truncate to offset 10 (a no-op) and retry the same fetch and will be stuck.

      To fix this, I see two options:

      Option 1: In the case that the truncation offset equals the fetch offset, we can instead return the previous epoch end offset. In this example, we would return truncation_offset=0. The downside is that this causes unnecessary truncation.

      Option 2: Rather than returning only the truncation offset, we can have the leader return both the previous "diverging" epoch and its end offset. In this example, B would return diverging_epoch=1, end_offset=10. Replica C would then know
      to truncate to offset 8.

      The second option is what was initially specified in the Raft proposal, but we changed during the discussion because we were not thinking of this case and we thought the response could be simplified. My inclination is to restore the originally specified truncation logic.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hachikuji Jason Gustafson
            hachikuji Jason Gustafson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment