Kafka
  1. Kafka
  2. KAFKA-691

Fault tolerance broken with replication factor 1

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      In 0.7 if a partition was down we would just send the message elsewhere. This meant that the partitioning was really more of a "stickiness" then a hard guarantee. This made it impossible to depend on it for partitioned, stateful processing.

      In 0.8 when running with replication this should not be a problem generally as the partitions are now highly available and fail over to other replicas. However in the case of replication factor = 1 no longer really works for most cases as now a dead broker will give errors for that broker.

      I am not sure of the best fix. Intuitively I think this is something that should be handled by the Partitioner interface. However currently the partitioner has no knowledge of which nodes are available. So you could use a random partitioner, but that would keep going back to the down node.

      1. KAFKA-691-v2.patch
        7 kB
        Maxime Brugidou
      2. KAFKA-691-v1.patch
        6 kB
        Maxime Brugidou
      3. kafka-691_extra.patch
        3 kB
        Jun Rao

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Maxime Brugidou
              Reporter:
              Jay Kreps
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development