Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-905

Logs can have same offsets causing recovery failure

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      Consider the following scenario -

      L F
      1 m1,m2 1 m1,m2
      3 m3,m4 3 m3,m4
      5 m5,m6 5 m5,m6

      HW = 6 HW = 4

      Follower goes down and comes back up. Truncates its log to HW

      L F
      1 m1,m2 1 m1,m2
      3 m3,m4 3 m3,m4
      5 m5,m6

      HW = 6 HW = 4

      Before follower catches up with the leader, leader goes down and follower becomes the leader. It then gets new messages

      F L
      1 m1,m2 1 m1,m2
      3 m3,m4 3 m3,m4
      5 m5,m6 10 m5-m10

      HW=6 HW=4

      follower fetches from offset 7. Since offset 7 is within the compressed message 10 in the leader, the whole message chunk is sent to the follower

      F L
      1 m1,m2 1 m1,m2
      3 m3,m4 3 m3,m4
      5 m5,m6 10 m5-m10
      10 m5-m10

      HW=4 HW=10

      The follower logs now contain the same offsets. On recovery, re-indexing will fail due to repeated offsets.

      Possible ways to fix this -
      1. The fetcher thread can do deep iteration instead of shallow iteration and drop the offsets that are less than the log end offset. This would however incur performance hit.
      2. To optimize step 1, we could do the deep iteration till the logical offset of the fetched message set is greater than the log end offset of the follower log and then switch to shallow iteration.
      3. On recovery we just truncate the active segment and refetch the data.

      All the above 3 steps are hacky. The right fix is to ensure we never corrupt the logs. We can incur data loss but should not compromise consistency. For 0.8, the easiest and simplest fix would be 3.

        Attachments

        1. KAFKA-905.patch
          5 kB
          Sriram
        2. KAFKA-905.rtf
          3 kB
          Sriram
        3. KAFKA-905-trunk.patch
          6 kB
          Sriram
        4. KAFKA-905-v2.patch
          6 kB
          Sriram

          Activity

            People

            • Assignee:
              sriramsub Sriram
              Reporter:
              sriramsub Sriram
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: