Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-17966

Controller replacement does not support scaling up before scaling down

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.9.0
    • None
    • kraft
    • None

    Description

      In KRaft, complex quorum changes are implemented as a series of single-controller changes. In this case, it is preferable to add controllers before removing controllers. For example, to replace a controller in a three-controller cluster, adding one controller and then removing the other allows the system to handle one controller failure at all times throughout the whole process. This is currently not possible, as it leads to DuplicateVoterException, so you are forced to do a scale down, followed by a scale up.

      Example:

      The operator can replace a failed disk with a new one. The replaced disk needs to be formatted with a new directory ID.

      $ CLUSTER_ID="$(bin/kafka-cluster.sh cluster-id --bootstrap-server localhost:9092 | awk -F': ' '{print $2}')"
      
      $ bin/kafka-storage.sh format \
        --config /opt/kafka/server2/config/server.properties \
        --cluster-id "$CLUSTER_ID" \
        --no-initial-controllers \
        --ignore-formatted
      Formatting metadata directory /opt/kafka/server2/metadata with metadata.version 3.9-IV0.
      

      After restarting the controller, the quorum will have two nodes with ID two: the original incarnation with a failed disk and an ever growing lag and follower status, plus a new one with a different directory ID and observer status.

      $ bin/kafka-metadata-quorum.sh --bootstrap-controller localhost:8000 describe --re --hu
      NodeId	DirectoryId           	LogEndOffset	Lag	LastFetchTimestamp	LastCaughtUpTimestamp	Status 	 
      0     	pbvuBlaTTwKRxS5NLJwRFQ	535         	0  	6 ms ago          	6 ms ago             	Leader 	 
      1     	QjRpFtVDTtCa8OLXiSbmmA	535         	0  	283 ms ago        	283 ms ago           	Follower    
      2     	slcsM5ZAR0SMIF_u__MAeg	407         	128	63307 ms ago      	63802 ms ago         	Follower    
      2     	wrqMDI1WDsqaooVSOtlgYw	535         	0  	281 ms ago        	281 ms ago           	Observer    
      8     	aXLz3ixjqzXhCYqKHRD4WQ	535         	0  	284 ms ago        	284 ms ago           	Observer    
      7     	KCriHQZm3TlxvEVNgyWKJw	535         	0  	284 ms ago        	284 ms ago           	Observer    
      9     	v5nnIwK8r0XqjyqlIPW-aw	535         	0  	284 ms ago        	284 ms ago           	Observer
      

      Once the new controller is in sync with the leader, we try to do a scale up.

      $ bin/kafka-metadata-quorum.sh \
        --bootstrap-controller localhost:8000 \
        --command-config /opt/kafka/server2/config/server.properties \
        add-controller
      org.apache.kafka.common.errors.DuplicateVoterException: The voter id for ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is already part of the set of voters [ReplicaKey(id=0, directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1, directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2, directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
      java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.DuplicateVoterException: The voter id for ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is already part of the set of voters [ReplicaKey(id=0, directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1, directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2, directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
      	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
      	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
      	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
      	at org.apache.kafka.tools.MetadataQuorumCommand.handleAddController(MetadataQuorumCommand.java:431)
      	at org.apache.kafka.tools.MetadataQuorumCommand.execute(MetadataQuorumCommand.java:147)
      	at org.apache.kafka.tools.MetadataQuorumCommand.mainNoExit(MetadataQuorumCommand.java:81)
      	at org.apache.kafka.tools.MetadataQuorumCommand.main(MetadataQuorumCommand.java:76)
      Caused by: org.apache.kafka.common.errors.DuplicateVoterException: The voter id for ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is already part of the set of voters [ReplicaKey(id=0, directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1, directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2, directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            fvaleri Federico Valeri
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: