Index: 082/configuration.html =================================================================== --- 082/configuration.html (revision 1635887) +++ 082/configuration.html (working copy) @@ -454,6 +454,13 @@ This configuration controls how frequently the log compactor will attempt to clean the log (assuming log compaction is enabled). By default we will avoid cleaning a log where more than 50% of the log has been compacted. This ratio bounds the maximum space wasted in the log by duplicates (at 50% at most 50% of the log could be duplicates). A higher ratio will mean fewer, more efficient cleanings but will mean more wasted space in the log. + min.insync.replicas + 1 + min.insync.replicas + When a producer sets request.required.acks to -1, min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
+ When used together, min.insync.replicas and request.required.acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with request.required.acks of -1. This will ensure that the producer raises an exception if a majority of replicas do not receive a write. + + retention.bytes None log.retention.bytes @@ -645,7 +652,7 @@

Index: 082/design.html =================================================================== --- 082/design.html (revision 1635887) +++ 082/design.html (working copy) @@ -227,6 +227,21 @@

This dilemma is not specific to Kafka. It exists in any quorum-based scheme. For example in a majority voting scheme, if a majority of servers suffer a permanent failure, then you must either choose to lose 100% of your data or violate consistency by taking what remains on an existing server as your new source of truth. + +

Availability and Durability Guarantees

+ +When writing to Kafka, producers can choose whether they wait for the message to be acknowledged by 0,1 or all (-1) replicas. Acknowledgement by all replicas (which is achieved by setting request.required.acks=-1 or required.acks=all) provide the strongest durability guarantee - the message will be acked by all in-sync replicas (although possibly not by all replicas, as replicas can go out of sync). A message that has been acknowledged by all in sync replicas will not be lost as long as at least one of those in-sync replica remains available. + +Note, however, that "acknowledged by all in-sync replicas" doesn't guarantee a specific number of replicas that acknowledge the message. For example, if a topic was configured with only two replicas and one failed, we will remain with only one in sync replica. Writes that specify required.acks=-1 will succeed, but could be lost if the last replica will fail. + +Although this ensures maximum availability of the partition, this behavior may be undesirable to some users who prefer durability over availability. Therefore, we provide two topic-level configurations that can be used to prefer message durability over availability: +
    +
  1. Disable unclean leader election - if all replicas become unavailable, then the partition will remain unavailable until the most recent leader becomes available again. This effectively prefers unavailability over the risk of message loss. See the previous section on Unclean Leader Election for clarification.
  2. +
  3. Specify a minimum ISR size - the partition will only accept writes if the size of the ISR is above a certain minimum, in order to prevent the loss of messages that were written to just a single replica, which subsequently becomes unavailable. This setting only takes effect if the producer uses required.acks=-1 and guarantees that the message will be acknowledged at least this many in-sync replicas. +This setting offers a trade-off between consistency and availability. A higher setting for minimum ISR size guarantees better consistency since the message is guaranteed to be written to more replicas which reduces the probability that it will be lost. However, it reduces availability since the partition will be unavailable for writes if the number of in-sync replicas drops below the minimum threshold.
  4. +
+ +

Replica Management

The above discussion on replicated logs really covers only a single log, i.e. one topic partition. However a Kafka cluster will manage hundreds or thousands of these partitions. We attempt to balance partitions within a cluster in a round-robin fashion to avoid clustering all partitions for high-volume topics on a small number of nodes. Likewise we try to balance leadership so that each node is the leader for a proportional share of its partitions.