Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4275

Slowness in sasl login or subject.doAs() causes zk client to falsely assume that the server did not respond, closes connection and goes to unnecessary retries

    XMLWordPrintableJSON

Details

    Description

      Zookeeper client does sasl auth (login and subject.doAs())as a preset before attempting a connection to server.
      If there is a delay in sasl auth (possibly due to slow Kerberos communication), ZK client falsely assumes that the zk server did not respond and runs in to unnecessary multiple retries.

      Client configuration:
      "zookeeper.session.timeout" = "3000"
      "zookeeper.recovery.retry" = "1"
      "zookeeper.recovery.retry.intervalmill" = "500"

      This configuration translates to  

      connect timeout as 1000ms
      Read Timeout as 2000ms

      Example: There was a 3 second delay in logging in the user as seen from the logs below. The connection attempt was made later. However, zk client did not wait for server response but logged a timeout (3 seconds > 1 sec connect timeout), closed the connection and went to retries. Since there was a consistent delay at Kerberos master, we had seen this retries go as long as 10 mins causing requests to timeout/fail.

      Logs:

      3/23/21 4:15:32.389 AM jute.maxbuffer value is xxxxx Bytes

      3/23/21 4:15:35.395 AM Client successfully logged in.

      3/23/21 4:15:35.396 AM TGT refresh sleeping until: Wed Mar 24 00:34:31 GMT 2021

      3/23/21 4:15:35.396 AM TGT refresh thread started.

      3/23/21 4:15:35.396 AM Client will use GSSAPI as SASL mechanism.

      3/23/21 4:15:35.396 AM TGT expires:                  xxx Mar xx 04:15:35 GMT 2021

      3/23/21 4:15:35.396 AM TGT valid starting at:        xxx Mar xx 04:15:35 GMT 2021

      3/23/21 4:15:35.397 AM Opening socket connection to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'

      3/23/21 4:15:35.397 AM Client session timed out, have not heard from server in 3008ms for sessionid 0x0

      3/23/21 4:15:35.397 AM Client session timed out, have not heard from server in 3008ms for sessionid 0x0, closing socket connection and attempting reconnect

      3/23/21 4:15:35.498 AM TGT renewal thread has been interrupted and will exit.

      3/23/21 4:15:38.503 AM Client successfully logged in.

      3/23/21 4:15:38.503 AM TGT expires:                  xxx Mar xx 04:15:38 GMT 2021

      3/23/21 4:15:38.503 AM Client will use GSSAPI as SASL mechanism.

      3/23/21 4:15:38.503 AM TGT valid starting at:        xxx Mar xx 04:15:38 GMT 2021

      3/23/21 4:15:38.503 AM TGT refresh thread started.

      3/23/21 4:15:38.503 AM TGT refresh sleeping until: Wed Mar 24 00:10:10 GMT 2021

      3/23/21 4:15:38.506 AM Opening socket connection to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'

      3/23/21 4:15:38.506 AM Client session timed out, have not heard from server in 3009ms for sessionid 0x0, closing socket connection and attempting reconnect

      3/23/21 4:15:38.506 AM Client session timed out, have not heard from server in 3009ms for sessionid 0x0

      3/23/21 4:15:38.606 AM TGT renewal thread has been interrupted and will exit.

      3/23/21 4:15:41.610 AM Client successfully logged in.

      3/23/21 4:15:41.611 AM TGT refresh sleeping until: xxx Mar xx 23:42:03 GMT 2021

      3/23/21 4:15:41.611 AM Client will use GSSAPI as SASL mechanism.

      3/23/21 4:15:41.611 AM TGT valid starting at:        xxx Mar xx 04:15:41 GMT 2021

      3/23/21 4:15:41.611 AM TGT expires:                  xxx Mar xx 04:15:41 GMT 2021

      3/23/21 4:15:41.611 AM TGT refresh thread started.

      3/23/21 4:15:41.612 AM Opening socket connection to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'

      3/23/21 4:15:41.613 AM Client session timed out, have not heard from server in 3006ms for sessionid 0x0

      3/23/21 4:15:41.613 AM Client session timed out, have not heard from server in 3006ms for sessionid 0x0, closing socket connection and attempting reconnect

      Attachments

        Activity

          People

            rvaleti Ravi Kishore Valeti
            rvaleti Ravi Kishore Valeti
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2.5h
                2.5h