Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4334

SASL authentication fails when using host aliases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.6.1
    • None
    • None
    • None
    • zookeeper sasl authentication kerberos

    Description

      I faced an issue while trying to use alternative aliases with Zookeeper quorum when SASL is enabled. The errors I get in zookeeper log are the following:
      ```
      2021-07-12 21:04:46,437 [myid:3] - WARN [NIOWorkerThread-3:ZooKeeperServer@1661] - Client /<IP addr>:37368 failed to SASL authenticate: {}
      javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
      at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
      at org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:49)
      at org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1650)
      at org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:1599)
      at org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:379)
      at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:182)
      at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339)
      at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
      at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)
      at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:856)
      at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
      at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
      at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167)
      ... 11 more
      Caused by: KrbException: Checksum failed
      at sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102)
      at sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94)
      at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175)
      at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281)
      at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:149)
      at sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:108)
      at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:829)
      ... 14 more
      Caused by: java.security.GeneralSecurityException: Checksum failed
      at sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451)
      at sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272)
      at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76)
      at sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100)
      ... 20 more
      ```

      What did I do?
      1) created host aliases for each quorum node (a,b,c): zk1, zk2, zk3
      2) Changed in zoo.cfg:
      changed from
      server.1=a
      server.2=b
      server.3=c

      to:
      server.1=zk1
      server.2=zk2
      server.3=zk3

      (at this stage after restarting the ensemble all works as expected.
      3) Generate new keytab with alias-based principals and host-based principals in zookeeper.keytab
      4) Change jaas.conf (server) definition from:
      Server

      { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true useTicketCache=false principal="zookeeper/a.com@COM"; }

      ;

      to
      Server

      { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true useTicketCache=false principal="zookeeper/zk1.com@COM"; }

      ;

      From that moment, after restarting quorum members, I get the above error.

      Now, why do I do this?
      To allow other services such as zkfc,hbase,hdfs,yarn to connect to the quorum using aliases. Interestingly, without changing the zookeeper principal, hbase works perfectly, but the other 3 services fail with:
      ```
      <2021-07-12T20:45:19.491+0200> <INFO> <org.apache.zookeeper.ZooKeeper>: <Initiating client connection, connectString=zk01.com:2181,zk02.com:2181,zk03.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3246fb96>
      <2021-07-12T20:45:19.519+0200> <INFO> <org.apache.zookeeper.Login>: <Client successfully logged in.>
      <2021-07-12T20:45:19.521+0200> <INFO> <org.apache.zookeeper.Login>: <TGT refresh thread started.>
      <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT valid starting at: Mon Jul 12 20:45:19 CEST 2021>
      <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT expires: Tue Jul 13 21:45:19 CEST 2021>
      <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.Login>: <TGT refresh sleeping until: Tue Jul 13 17:05:16 CEST 2021>
      <2021-07-12T20:45:19.524+0200> <INFO> <org.apache.zookeeper.client.ZooKeeperSaslClient>: <Client will use GSSAPI as SASL mechanism.>
      <2021-07-12T20:45:19.530+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: <Opening socket connection to server zk02.com/<ip addr>:2181. Will attempt to SASL-authenticate using Login Context section 'Client'>
      <2021-07-12T20:45:19.535+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: <Socket connection established to zk02.com/<ip addr>:2181, initiating session>
      <2021-07-12T20:45:19.543+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: <Session establishment complete on server zk02.com/<ip addr>:2181, sessionid = 0x200247870fb0007, negotiated timeout = 10000>
      <2021-07-12T20:45:19.561+0200> <ERROR> <org.apache.zookeeper.client.ZooKeeperSaslClient>: <SASL authentication failed using login context 'Client' with exception: {}>
      javax.security.sasl.SaslException: Error in authenticating with a Zookeeper Quorum member: the quorum member's saslToken is null.
      at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:279)
      at org.apache.zookeeper.client.ZooKeeperSaslClient.respondToServer(ZooKeeperSaslClient.java:242)
      at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:805)
      at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94)
      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
      at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
      <2021-07-12T20:45:19.564+0200> <INFO> <org.apache.zookeeper.ClientCnxn>: <Unable to read additional data from server sessionid 0x200247870fb0007, likely server has closed socket, closing socket connection and attempting reconnect>
      <2021-07-12T20:45:19.671+0200> <INFO> <org.apache.hadoop.ha.ActiveStandbyElector>: <Session connected.>
      <2021-07-12T20:45:19.672+0200> <ERROR> <org.apache.hadoop.hdfs.tools.DFSZKFailoverController>: <DFSZKFailOverController exiting due to earlier exception java.io.IOException: Couldn't determine existence of znode
      ```
      When I change the principle of zookeeper hbase starts failing with this error and other services except for the zookeeper itself is somehow working fine. After that, I cannot connect manually to the zk quorum using zkCli and zookeeper-client with all possible combinations of principals.

      I wonder if that may have something to do with the "Server environment:host.name=" pointing to the canonical name (and not the alias) during the startup. The same happens after specifying the alias with clientPortAddress=.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ekleszcz Emil Kleszcz
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: