Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12793

Client-side Circuit Breaker for Partition Write Errors

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • clients
    • None

    Description

      When Kafka is used to build data pipeline in mission critical business scenarios, availability and throughput are the most important operational goals that need to be maintained in presence of transient or permanent local failure. One typical situation that requires Ops intervention is disk failure, some partitions have long write latency caused by extremely high disk utilization; since all partitions share the same buffer under the current producer thread model, the buffer will be filled up quickly and eventually the good partitions are impacted as well. The cluster level success rate and timeout ratio will degrade until the local infrastructure issue is resolved.

      One way to mitigate this issue is to add client side mechanism to short circuit problematic partitions during transient failure. Similar approach is applied in other distributed systems and RPC frameworks.

      We propose to add a configuration driven circuit breaking mechanism that allows Kafka client to ‘mute’ partitions when certain condition is met. The mechanism adds callbacks in Sender class workflow that allows to filtering partitions based on certain policy.

      The client can choose proper implementation that fits a special failure scenario, Client-side custom implementation of Partitioner and ProducerInterceptor

      • Customize the implementation of ProducerInterceptor, and choose the strategy to mute partitions.
      • Customize the implementation of Partitioner, and choose the strategy to filtering partitions.

      Muting partitions have impact when the topic contains keyed message as messages will be written to more than one partitions during period of recovery. We believe this can be an explicit trade-off the application makes between availability and message ordering.

      KIP-693: https://cwiki.apache.org/confluence/display/KAFKA/KIP-693%3A+Client-side+Circuit+Breaker+for+Partition+Write+Errors

      Attachments

        1. KAFKA-12793.patch
          30 kB
          Kahn Cheny

        Activity

          People

            KahnCheny Kahn Cheny
            KahnCheny Kahn Cheny
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: