Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7122

Data is lost when ZooKeeper times out

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: 0.11.0.2
    • Fix Version/s: None
    • Component/s: core, replication
    • Labels:
      None

      Description

      Noticed that a kafka cluster will lose data when a leader for a partition has their zookeeper connection timeout.

      Sequence of events:

      1. Say broker A leads a partition followed by brokers B and C
      2. A ZK node has a network issue, happens to be the node used by broker A. Lets say this happens at offset X
      3. Kafka Controller immediately selects broker C as the new partition leader
      4. Broker A does not timeout from zookeeper for another 4 seconds. Broker A still thinks it is the leader, presumably accepting producer writes.
      5. Broker A detects the ZK timeout and leaves the ISR.
      6. Broker A reconnects to ZK, rejoins cluster as follower for partition
      7. Broker A truncates log to some offset Y such that Y > X. Broker A proceeds to catch up normally and becomes an ISR
      8. ISRs for partition are now in an inconsistent state:
        1. Broker C has all offsets X through Y plus everything after
        2. Broker B has all offsets X through Y plus everything after
        3. Broker A has offsets up to X and after Y. Everything between X and Y IS MISSING
      9. Within 5 minutes, controller trigger preferred replica election making Broker A the new leader for partition (this is default behavior)

      All consumers after step 9 will not receive any messages for offsets between X and Y.

       

      The root problem here seems to be broker A truncates to offset Y when rejoining the cluster. It should be truncating further back to offset X to prevent data loss

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              NickLipple Nick Lipple
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: