Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7987

a broker's ZK session may die on transient auth failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.8.0
    • None
    • None

    Description

      After a transient network issue, we saw the following log in a broker.

      [23:37:02,102] ERROR SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. (org.apache.zookeeper.ClientCnxn)
      [23:37:02,102] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient)
      

      The network issue prevented the broker from communicating to ZK. The broker's ZK session then expired, but the broker didn't know that yet since it couldn't establish a connection to ZK. When the network was back, the broker tried to establish a connection to ZK, but failed due to auth failure (likely due to a transient KDC issue). The current logic just ignores the auth failure without trying to create a new ZK session. Then the broker will be permanently in a state that it's alive, but not registered in ZK.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              junrao Jun Rao
              Jun Rao Jun Rao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: