Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.6.2
-
None
-
None
Description
Description
As we know, assume we are using
the default 1 MB jute.maxbuffer, if a zk client tries to write a large
znode > 1MB, the server will fail it. Server will log "Len error" and
close the connection. The client will receive a connection loss. In a
third party ZkClient lib (eg. I0Itec ZkClient), it'll keep retrying the
operation upon connection loss. And this forever retrying might have a
chance to take down the zk server.
Log
2021/01/04 18:49:06.372 WARN [ClientCnxn] [main-SendThread(localhost:2181)] Session 0x776989df3190104 for server localhost:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe 2021/01/04 20:03:22.535 WARN [ClientCnxn] [main-SendThread(localhost:2181)] Session 0x776989df3190104 for server localhost:2181, unexpected error, closing socket connection and attempting reconnectjava.io.IOException: Connection reset by peer\{noformat}
In fact, the error log in the server has more meaningful information:
2021-01-04 19:19:38,467 [myid:8] - WARN [NIOServerCxn.Factory:/0.0.0.0:2181:NIOServerCnxn@373] - Exception causing close of session 0x976988b591a010b due to java.io.IOException: Len error 1076482
Proposed Solution
Client side also blocks large data write by add a sanity check for buffer size for the outgoing request and throwing a new KeeperException to signal clients to stop retrying the same operation. It's more efficient as the request is not sent to the server so a round trip is saved and server does not have to disconnect the connection.
Attachments
Issue Links
- is related to
-
DRILL-8426 Endless retrying zk set data for a large query
- Closed