Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4666

Failure test for Kafka configured for consistency vs availability

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • documentation
    • None

    Description

      We recently had an issue with our Kafka setup because of a misconfiguration.

      In short, we thought we have configured Kafka for durability, but we didn't set the producers to acks=all. During a full outage, we had situations where some partitions were "partitioned", meaning that the followers started without properly waiting for the right leader, and thus we lost data. Again, this is not an issue with Kafka, but a misconfiguration on our side.

      I think we reproduced the issue, and we built a docker test that proves that, if the producer isn't set with acks=all, then data can be lost during an almost full outage. The test is attached.

      I was thinking to send a PR, but wanted to run this through you first, as it's not necessarily proving that a feature works as expected.

      In addition, I think the documentation could be slightly improved, for instance in the section:
      http://kafka.apache.org/documentation/#design_ha
      by clearly stating that there are 3 steps one should do for configuring kafka for consistency, the third being that producers should be set with acks=all (which is now part of the 2nd point).

      Please let me know what do you think, and I can send a PR if you agree.

      Attachments

        1. consistency_test.py
          7 kB
          Emanuele Cesena

        Activity

          People

            Unassigned Unassigned
            ecesena Emanuele Cesena
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: