Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1555

provide strong consistency with reasonable availability

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.8.1.1
    • 0.8.2.0
    • controller
    • None

    Description

      In a mission critical application, we expect a kafka cluster with 3 brokers can satisfy two requirements:
      1. When 1 broker is down, no message loss or service blocking happens.
      2. In worse cases such as two brokers are down, service can be blocked, but no message loss happens.

      We found that current kafka versoin (0.8.1.1) cannot achieve the requirements due to its three behaviors:
      1. when choosing a new leader from 2 followers in ISR, the one with less messages may be chosen as the leader.
      2. even when replica.lag.max.messages=0, a follower can stay in ISR when it has less messages than the leader.
      3. ISR can contains only 1 broker, therefore acknowledged messages may be stored in only 1 broker.

      The following is an analytical proof.
      We consider a cluster with 3 brokers and a topic with 3 replicas, and assume that at the beginning, all 3 replicas, leader A, followers B and C, are in sync, i.e., they have the same messages and are all in ISR.
      According to the value of request.required.acks (acks for short), there are the following cases.
      1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
      2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this time, although C hasn't received m, C is still in ISR. If A is killed, C can be elected as the new leader, and consumers will miss m.
      3. acks=-1. B and C restart and are removed from ISR. Producer sends a message m to A, and receives an acknowledgement. Disk failure happens in A before B and C replicate m. Message m is lost.

      In summary, any existing configuration cannot satisfy the requirements.

      Attachments

        1. KAFKA-1555.0.patch
          6 kB
          Gwen Shapira
        2. KAFKA-1555.1.patch
          8 kB
          Gwen Shapira
        3. KAFKA-1555.2.patch
          19 kB
          Gwen Shapira
        4. KAFKA-1555.3.patch
          17 kB
          Gwen Shapira
        5. KAFKA-1555.4.patch
          18 kB
          Gwen Shapira
        6. KAFKA-1555.5.patch
          37 kB
          Gwen Shapira
        7. KAFKA-1555.5.patch
          20 kB
          Gwen Shapira
        8. KAFKA-1555.6.patch
          37 kB
          Gwen Shapira
        9. KAFKA-1555.8.patch
          37 kB
          Gwen Shapira
        10. KAFKA-1555.9.patch
          37 kB
          Gwen Shapira
        11. KAFKA-1555-DOCS.0.patch
          206 kB
          Gwen Shapira
        12. KAFKA-1555-DOCS.1.patch
          5 kB
          Gwen Shapira
        13. KAFKA-1555-DOCS.2.patch
          6 kB
          Gwen Shapira
        14. KAFKA-1555-DOCS.3.patch
          6 kB
          Gwen Shapira
        15. KAFKA-1555-DOCS.4.patch
          6 kB
          Gwen Shapira

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gwenshap Gwen Shapira
            jiangwu.mail@gmail.com Jiang Wu
            Joel Jacob Koshy Joel Jacob Koshy
            Votes:
            1 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment