Index: configuration.html
===================================================================
--- configuration.html (revision 1634108)
+++ configuration.html (working copy)
@@ -419,6 +419,12 @@
This configuration controls how frequently the log compactor will attempt to clean the log (assuming log compaction is enabled). By default we will avoid cleaning a log where more than 50% of the log has been compacted. This ratio bounds the maximum space wasted in the log by duplicates (at 50% at most 50% of the log could be duplicates). A higher ratio will mean fewer, more efficient cleanings but will mean more wasted space in the log. |
+ | min.insync.replicas |
+ 1 |
+ min.insync.replicas |
+ If number of insync replicas drops below this threshold, writing messages with -1 (or all) required acks will throw a NotEnoughReplicas exception. This is used to provide extra durability guarantees - not only all insync replicas acknowledges the message, we also guarantee the message was acknowledged by at least that many replicas |
+
+
| retention.bytes |
None |
log.retention.bytes |
@@ -604,7 +610,7 @@
- 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails).
- 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost).
-
- -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains.
+
- -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. Note that this does not completely eliminate risk of message loss - please read the Replication section of the design documentation for more in-depth discussion and additional settings that increase durability.
Index: design.html
===================================================================
--- design.html (revision 1634108)
+++ design.html (working copy)
@@ -227,6 +227,19 @@
This dilemma is not specific to Kafka. It exists in any quorum-based scheme. For example in a majority voting scheme, if a majority of servers suffer a permanent failure, then you must either choose to lose 100% of your data or violate consistency by taking what remains on an existing server as your new source of truth.
+
+
Availability and Durability Guarantees
+
+When writing to Kafka, producers can choose whether they wait for the message to be acknowledged by 0,1 or all (-1) brokers. Acknowledgement by all brokers (achieved by setting request.required.acks=-1 or required.acks=all) provide the best durability guarantee - a message that was acked will not be lost as long as at least one in sync replica remains.
+
+Note, however, that "acknowledged by all in-sync replicas" doesn't guarantee a specific number of replicas that acknowledge the message. For example, if a topic was configured with only two replicas and one failed, we will remain with only one in sync replica. Writes that specify required.acks=-1 will succeed, but could be lost if the last replica will fail.
+
+To avoid this unfortunately condition, we have two topic-level configurations that can be used to prefer message durability over availability:
+
+ - Disable unclean leader election - if all replicas failed, the partition will remain unavailable until the last leader is restored. This prevents choosing a new leader that does not have the most recent messages. Thus choosing longer period of unavailability over risk of message loss. See previous section on Unclean Leader Election for clarification.
+ - Specific a minimum ISR size - the partition will only accept writes if the size of the ISR is above a certain minimum, to prevent loss of messages that were written to just a single replica, which failed. This setting only takes effect if the producer uses required.acks=-1 and guarantees that not only was the message acked by all in sync replicas, it was also acked by at least a certain number of replicas.
+
+
Replica Management
The above discussion on replicated logs really covers only a single log, i.e. one topic partition. However a Kafka cluster will manage hundreds or thousands of these partitions. We attempt to balance partitions within a cluster in a round-robin fashion to avoid clustering all partitions for high-volume topics on a small number of nodes. Likewise we try to balance leadership so that each node is the leader for a proportional share of its partitions.