It seems, after updating listener SSL certificate with dynamic broker configuration update, old certificate is somehow still used for broker client SSL factory. Because of this broker fails to create new connection to controller after old certificate expires.
KAFKA-8336 there was an issue, when client-side SSL factory wasn't updating certificate, when it was changed with dynamic configuration. That bug have been fixed in version 2.3 and I can confirm, that dynamic update worked for us with kafka 2.4. But now we have updated clusters to 2.7 and see this (or at least similar) problem again.
First we've seen this on confluent 6.1.2, which (I think) based on kafka 2.7.0. Then I tried vanilla versions 2.7.0 and 2.7.2 and can reproduce problem on them just fine
- Have zookeeper somewhere (in my example it will be "10.88.0.21:2181").
- Get vanilla version 2.7.2 (or 2.7.0) from https://kafka.apache.org/downloads .
- Make basic broker config like this (don't forget to actually create log.dirs):
(I use here TLS 1.2 just so I can see client certificate in TLS handshake in traffic dump, you will get same error with default TLS 1.3 too)
- Repeat this config for another 2 brokers, changing id, listener port and certificate accordingly.
- Make basic client config (I use for it one of brokers' certificates):
- Create usual local self-signed PKI for test
- generate self-signed CA certificate and private key. Place certificate in truststore.
- create keys for broker certificates and create requests from them as usual (I'll use here same subject for all brokers)
- create 2 certificates as usual
- Use "faketime" utility to make third certificate expire soon:
- create keystores from certificates and place them according to broker configs from earlier
- Run 3 brokers with your configs like
(I start it here without daemon mode to see logs right on terminal - just use "tmux" or something to run 3 brokers simultaneously)
- you can check that one broker certificate will expire soon with
- Issue new certificate to replace one, which will expire soon, place it in keystore and replace old keystore with it.
- Use dynamic configuration to make broker re-read keystore:
- You can check that broker now has new certificate on its listener with same command
- Wait until that old certificate expires and make some changes, which provoke broker to make new controller connection. For example if I have controller on broker "1" and expired certificate was on broker "2", then I restart broker "3".
- On broker with expired certificate you will see in log something like
and controller log will show something like
and if broker with expired and changed certificate was controller itself, then it even could not connect to itself.
- If you make traffic dump (and you use TLS 1.2 or less) then you will see that broker client connection tries to use old certificate in TLS handshake.
Here is example of traffic dump, when broker with expired and dynamically changed certificate is current controller, so it can't connect to itself: failed-controller-single-session-20211119.pcap.gz
In this example you will see that "Server" use new certificate and "Client" use old certificate, but it's same broker!