When using SSL certificates for public IPv6 DNS endpoints as received from some public Service like "Let's Encrypt" for Quorum Encryption, Zookeeper validates the SNAs of that certificate for the IP Address instead of the DNS name, as configured.
As a Result, these certificates can't be used, since no certificates for IPv6 IPs issued.
This has been observed with Zookeeper Version 3.5.9, which is the one bundled in the most recent release of Kafka (2.8.1).
In the affected environment, there is a 3-node-Zookeeper-Cluster, which is configured as this in zookeeper.properties (mind the DNS name!):
All these records do have a public IPv6 entry only.
The SSL certificates from Let's Encrypt are requested and added to the Quorum-Keystores like this:
- Using https://github.com/acmesh-official/acme.sh
- Requesting the cert from Let's Encrypt using:
for each system.
- Merge fullchain- and certificate-file to a single PKCS12 file using:
- Adding the resulting PKCS12 file to the Quorum Keystore:
When any of the systems tries to initiate the quorum-connect, their logs state that the remote's Certificates could not be verified, since the SNA-List does not contain the IPv6 address.
For example: This is the log from zookeeper2.ourdomain.cloud when connecting zookeeper3.ourdomain.cloud:
I think the log lines cited clearly show:
- Zookeeper is picking up the correct certificate from the quorum Keystore, since it states that the request does not match any SNA and lists zookeeper3.ourdomain.cloud only, which it can only know from the certificate itself.
- Zookeeper is validating the wrong thing here: Even though the config clearly states to use a DNS name, the certificates SNAs alre validated against the IPv6 address that record belongs to instead of the DNS name configured (ERROR Failed to verify host address: 2a01:-