Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1193

Data loss if broker is killed using kill -9

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.8.0, 0.8.1
    • 0.8.2.0
    • replication
    • None
    • Centos 6.3

    Description

      We are having kafka cluster of 2 nodes. (Using Kafka 0.8.0 version)
      Replication Factor: 2
      Number of partitions: 2

      Actual Behaviour:
      -------------------------
      Out of two nodes, if leader node goes down then data lost happens.

      Steps to Reproduce:
      ------------------------------
      1. Create a 2 node kafka cluster with replication factor 2
      2. Start the Kafka cluster
      3. Create a topic lets say "test-trunk111"
      4. Restart any one node.
      5. Check topic status using kafka-list-topic tool.
      topic isr status is:

      topic: test-trunk111 partition: 0 leader: 0 replicas: 1,0 isr: 0,1
      topic: test-trunk111 partition: 1 leader: 0 replicas: 0,1 isr: 0,1

      If there is only one broker node in isr list then wait for some time and again check isr status of topic. There should be 2 brokers in isr list.
      6. Start producing the data.
      7. Kill leader node (borker-0 in our case) meanwhile of data producing.
      8. After all data is produced start consumer.
      9. Observe the behaviour. There is data loss.

      After leader goes down, topic isr status is:

      topic: test-trunk111 partition: 0 leader: 1 replicas: 1,0 isr: 1
      topic: test-trunk111 partition: 1 leader: 1 replicas: 0,1 isr: 1

      We have tried below things to avoid data loss:
      ----------------------------------------------------------------

      1. Configured "request.required.acks=-1" in producer configuration because as mentioned in documentation http://kafka.apache.org/documentation.html#producerconfigs, setting this value to -1 provides guarantee that no messages will be lost.
      2. Increased the "message.send.max.retries" from 3 to 10 in producer configuration.

      3. Set "controlled.shutdown.enable" to true in broker configuration.

      4. Tested with Kafka-0.8.1 after applying patch KAFKA-1188.patch available on https://issues.apache.org/jira/browse/KAFKA-1188

      Nothing work out from above things in case of leader node is killed using "kill -9 <pid>".

      Expected Behaviour:
      ----------------------------
      No data should be lost.

      Attachments

        Activity

          People

            Unassigned Unassigned
            hanish.bansal.agarwal Hanish Bansal
            Votes:
            4 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: