Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.7.0, 3.6.1, 3.8.0
-
None
-
None
Description
We are migrating our Kafka cluster from zookeeper to Kraft mode. We are running individual brokers and controllers with TLS enabled and IPs are given for communication.
TLS enabled setup works fine among the brokers and the certificate looks something like:
Common Name: *.kafka.service.consul Subject Alternative Names: *.kafka.service.consul, IP Address:10.87.170.78
Note:
- The DNS name for the node does not match the CN but since we are using IPs as communication, we have provided IPs as SAN.
- Same with the controllers, IPs are given as SAN in the certificate.
- Issue is not related to the migration so just sharing configuration relevant for the TLS piece.
In the current setup I am running 3 brokers and 3 controllers.
CONTROLLER:
Relevant controller configurations from one of the controllers:
KAFKA_CFG_PROCESS_ROLES=controller KAFKA_KRAFT_CLUSTER_ID=5kztjhJ4SxSu-kdiEYDUow KAFKA_CFG_NODE_ID=6 KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=4@10.87.170.83:9097,5@10.87.170.9:9097,6@10.87.170.6:9097 KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INSIDE_SSL KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:SSL,INSIDE_SSL:SSL KAFKA_CFG_LISTENERS=CONTROLLER://10.87.170.6:9097
Controller certificate has:
Common Name: *.kafka.service.consul Subject Alternative Names: *.kafka.service.consul, IP Address:10.87.170.6
BROKER:
Relevant broker configuration from one of the brokers:
KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INSIDE_SSL KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=4@10.87.170.83:9097,5@10.87.170.9:9097,6@10.87.170.6:9097 KAFKA_CFG_PROCESS_ROLES=broker KAFKA_CFG_NODE_ID=3 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INSIDE_SSL:SSL,OUTSIDE_SSL:SSL,CONTROLLER:SSL KAFKA_CFG_LISTENERS=INSIDE_SSL://10.87.170.78:9093,OUTSIDE_SSL://10.87.170.78:9096 KAFKA_CFG_ADVERTISED_LISTENERS=INSIDE_SSL://10.87.170.78:9093,OUTSIDE_SSL://10.87.170.78:9096
Broker certificate has:
Common Name: *.kafka.service.consul Subject Alternative Names: *.kafka.service.consul, IP Address:10.87.170.78
ISSUE 1:
With this setup Kafka broker is failing to connect to the controller, see the following error:
2024-05-22 17:53:46,413] ERROR [broker-2-to-controller-heartbeat-channel-manager]: Request BrokerRegistrationRequestData(brokerId=2, clusterId='5kztjhJ4SxSu-kdiEYDUow', incarnationId=7741fgH6T4SQqGsho8E6mw, listeners=[Listener(name='INSIDE_SSL', host='10.87.170.81', port=9093, securityProtocol=1), Listener(name='INSIDE', host='10.87.170.81', port=9094, securityProtocol=0), Listener(name='OUTSIDE', host='10.87.170.81', port=9092, securityProtocol=0), Listener(name='OUTSIDE_SSL', host='10.87.170.81', port=9096, securityProtocol=1)], features=[Feature(name='metadata.version', minSupportedVersion=1, maxSupportedVersion=19)], rack=null, isMigratingZkBroker=false, logDirs=[TJssfKDD-iBFYfIYCKOcew], previousBrokerEpoch=-1) failed due to authentication error with controller (kafka.server.NodeToControllerRequestThread)org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failedCaused by: javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching cp-internal-onecloud-kfkc1.node.cp-internal-onecloud.consul found. at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1351) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1226) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1169) at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209) at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:435) at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:523) at org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:373) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:293) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:178) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:543) at org.apache.kafka.common.network.Selector.poll(Selector.java:481) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:585) at org.apache.kafka.server.util.InterBrokerSendThread.pollOnce(InterBrokerSendThread.java:109) at kafka.server.NodeToControllerRequestThread.doWork(NodeToControllerChannelManager.scala:382) at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching cp-internal-onecloud-kfkc1.node.cp-internal-onecloud.consul found. at java.base/sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:212) at java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:103) at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:458) at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:418) at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:292) at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1329) ... 19 more
ISSUE 2:
Looks like kraft controller does the reverse DNS lookup for itself as well while starting and we are seeing DNS name matching issue in the controller as well. Log snippet from Controller with node ID 4:
[2024-05-16 20:57:07,962] INFO [SocketServer listenerType=CONTROLLER, nodeId=4] Failed authentication with /10.87.170.83 (channelId=10.87.170.83:9097-10.87.170.83:42548-3) (SSL handshake failed) (org.apache.kafka.common.network.Selector)[2024-05-16 20:57:11,118] INFO [ControllerRegistrationManager id=4 incarnation=HWT3UBxJSPGuefZ9xdqH-g] sendControllerRegistration: attempting to send ControllerRegistrationRequestData(controllerId=4, incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true, listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097, securityProtocol=1)], features=[Feature(name='metadata.version', minSupportedVersion=1, maxSupportedVersion=19)]) (kafka.server.ControllerRegistrationManager)[2024-05-16 20:57:11,129] INFO [NodeToControllerChannelManager id=4 name=registration] Failed authentication with cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1/10.87.170.83 (channelId=4) (SSL handshake failed) (org.apache.kafka.common.network.Selector)[2024-05-16 20:57:11,130] INFO [NodeToControllerChannelManager id=4 name=registration] Node 4 disconnected. (org.apache.kafka.clients.NetworkClient)[2024-05-16 20:57:11,130] INFO [SocketServer listenerType=CONTROLLER, nodeId=4] Failed authentication with /10.87.170.83 (channelId=10.87.170.83:9097-10.87.170.83:42564-4) (SSL handshake failed) (org.apache.kafka.common.network.Selector)[2024-05-16 20:57:11,130] ERROR [NodeToControllerChannelManager id=4 name=registration] Connection to node 4 (cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1/10.87.170.83:9097) failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient)[2024-05-16 20:57:11,131] ERROR [controller-4-to-controller-registration-channel-manager]: Failed to send the following request due to authentication error: ClientRequest(expectResponse=true, callback=kafka.server.NodeToControllerRequestThread$$Lambda$850/0x00007fee184be288@41a1ff51, destination=4, correlationId=6, clientId=4, createdTimeMs=1715893031119, requestBuilder=ControllerRegistrationRequestData(controllerId=4, incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true, listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097, securityProtocol=1)], features=[Feature(name='metadata.version', minSupportedVersion=1, maxSupportedVersion=19)])) (kafka.server.NodeToControllerRequestThread)[2024-05-16 20:57:11,131] ERROR [controller-4-to-controller-registration-channel-manager]: Request ControllerRegistrationRequestData(controllerId=4, incarnationId=HWT3UBxJSPGuefZ9xdqH-g, zkMigrationReady=true, listeners=[Listener(name='CONTROLLER', host='10.87.170.83', port=9097, securityProtocol=1)], features=[Feature(name='metadata.version', minSupportedVersion=1, maxSupportedVersion=19)]) failed due to authentication error with controller (kafka.server.NodeToControllerRequestThread)org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failedCaused by: javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1 found. at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1351) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1226) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1169) at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209) at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:435) at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:523) at org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:373) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:293) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:178) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:543) at org.apache.kafka.common.network.Selector.poll(Selector.java:481) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:585) at org.apache.kafka.server.util.InterBrokerSendThread.pollOnce(InterBrokerSendThread.java:109) at kafka.server.NodeToControllerRequestThread.doWork(NodeToControllerChannelManager.scala:382) at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching cp-internal-onecloud-kfkc1.cp-internal-onecloud-kfkc1 found. at java.base/sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:212) at java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:103) at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:458) at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:418) at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:292) at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1329)
Queries:
1. Given IPs for communication and IPs as SANs. Why does inter broker communication works fine but not broker-controller and controller-controller?
2. Why Is controller doing reverse DNS lookup? Is there a way to disable that?
Note: we do not wish to set KAFKA_CFG_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM=" " as it would disable IP matching as well, per our understanding.
Please let me know if you would like to know about any other configuration and logs.