Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2143

Replicas get ahead of leader and fail

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.8.2.1
    • 0.9.0.1
    • replication
    • None

    Description

      On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own,

      Looking at the logs, the thing which failed was partition 7 of the topic background_queue. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines:

      On node 3 (the leader):

      [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager)
      [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager)
      [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition)
      

      Note that both replicas suddenly asked for an offset ahead of the available offsets.

      And on nodes 1 and 4 (the replicas) many occurrences of the following:

      [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited)
      

      Based on my reading, this looks like the replicas somehow got ahead of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4).

      Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            becket_qin Jiangjie Qin
            eapache Evan Huus
            Gwen Shapira Gwen Shapira
            Votes:
            8 Vote for this issue
            Watchers:
            26 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment