Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6718

Rack Aware Stand-by Task Assignment for Kafka Streams

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.2.0
    • streams

    Description

      Machines in data centre are sometimes grouped in racks. Racks provide isolation as each rack may be in a different physical location and has its own power source. When tasks are properly replicated across racks, it provides fault tolerance in that if a rack goes down, the remaining racks can continue to serve traffic.
       
      This feature is already implemented at Kafka KIP-36 but we needed similar for task assignments at Kafka Streams Application layer. 
       
      This features enables replica tasks to be assigned on different racks for fault-tolerance.
      NUM_STANDBY_REPLICAS = x
      totalTasks = x+1 (replica + active)
      1. If there are no rackID provided: Cluster will behave rack-unaware
      2. If same rackId is given to all the nodes: Cluster will behave rack-unaware
      3. If (totalTasks <= number of racks), then Cluster will be rack aware i.e. each replica task is each assigned to a different rack.
      4. Id (totalTasks > number of racks), then it will first assign tasks on different racks, further tasks will be assigned to least loaded node, cluster wide.

      We have added another config in StreamsConfig called "RACK_ID_CONFIG" which helps StickyPartitionAssignor to assign tasks in such a way that no two replica tasks are on same rack if possible.
      Post that it also helps to maintain stickyness with-in the rack.|

      Attachments

        Issue Links

          Activity

            People

              lkokhreidze Levani Kokhreidze
              _deepakgoyal Deepak Goyal
              Ashish Surana Ashish Surana
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: