Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
3.5.9
Description
Zookeeper client does sasl auth (login and subject.doAs())as a preset before attempting a connection to server.
If there is a delay in sasl auth (possibly due to slow Kerberos communication), ZK client falsely assumes that the zk server did not respond and runs in to unnecessary multiple retries.
Client configuration:
"zookeeper.session.timeout" = "3000"
"zookeeper.recovery.retry" = "1"
"zookeeper.recovery.retry.intervalmill" = "500"
This configuration translates to
connect timeout as 1000ms
Read Timeout as 2000ms
Example: There was a 3 second delay in logging in the user as seen from the logs below. The connection attempt was made later. However, zk client did not wait for server response but logged a timeout (3 seconds > 1 sec connect timeout), closed the connection and went to retries. Since there was a consistent delay at Kerberos master, we had seen this retries go as long as 10 mins causing requests to timeout/fail.
Logs:
3/23/21 4:15:32.389 AM jute.maxbuffer value is xxxxx Bytes
3/23/21 4:15:35.395 AM Client successfully logged in.
3/23/21 4:15:35.396 AM TGT refresh sleeping until: Wed Mar 24 00:34:31 GMT 2021
3/23/21 4:15:35.396 AM TGT refresh thread started.
3/23/21 4:15:35.396 AM Client will use GSSAPI as SASL mechanism.
3/23/21 4:15:35.396 AM TGT expires: xxx Mar xx 04:15:35 GMT 2021
3/23/21 4:15:35.396 AM TGT valid starting at: xxx Mar xx 04:15:35 GMT 2021
3/23/21 4:15:35.397 AM Opening socket connection to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
3/23/21 4:15:35.397 AM Client session timed out, have not heard from server in 3008ms for sessionid 0x0
3/23/21 4:15:35.397 AM Client session timed out, have not heard from server in 3008ms for sessionid 0x0, closing socket connection and attempting reconnect
3/23/21 4:15:35.498 AM TGT renewal thread has been interrupted and will exit.
3/23/21 4:15:38.503 AM Client successfully logged in.
3/23/21 4:15:38.503 AM TGT expires: xxx Mar xx 04:15:38 GMT 2021
3/23/21 4:15:38.503 AM Client will use GSSAPI as SASL mechanism.
3/23/21 4:15:38.503 AM TGT valid starting at: xxx Mar xx 04:15:38 GMT 2021
3/23/21 4:15:38.503 AM TGT refresh thread started.
3/23/21 4:15:38.503 AM TGT refresh sleeping until: Wed Mar 24 00:10:10 GMT 2021
3/23/21 4:15:38.506 AM Opening socket connection to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
3/23/21 4:15:38.506 AM Client session timed out, have not heard from server in 3009ms for sessionid 0x0, closing socket connection and attempting reconnect
3/23/21 4:15:38.506 AM Client session timed out, have not heard from server in 3009ms for sessionid 0x0
3/23/21 4:15:38.606 AM TGT renewal thread has been interrupted and will exit.
3/23/21 4:15:41.610 AM Client successfully logged in.
3/23/21 4:15:41.611 AM TGT refresh sleeping until: xxx Mar xx 23:42:03 GMT 2021
3/23/21 4:15:41.611 AM Client will use GSSAPI as SASL mechanism.
3/23/21 4:15:41.611 AM TGT valid starting at: xxx Mar xx 04:15:41 GMT 2021
3/23/21 4:15:41.611 AM TGT expires: xxx Mar xx 04:15:41 GMT 2021
3/23/21 4:15:41.611 AM TGT refresh thread started.
3/23/21 4:15:41.612 AM Opening socket connection to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
3/23/21 4:15:41.613 AM Client session timed out, have not heard from server in 3006ms for sessionid 0x0
3/23/21 4:15:41.613 AM Client session timed out, have not heard from server in 3006ms for sessionid 0x0, closing socket connection and attempting reconnect