Description
While running freon with 1 Node ratis, it was observed that the TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for requestTimeoutDuration) even though the request is processed successfully and acknowledged back. This ends up creating a memory pressure causing ozone client to go OOM .
Heapdump analysis of HDDS-2331 , it seems the timeout schduler holding onto total of 176 requests, (88 of writeChunk containing actual data and 88 putBlock requests) although data write is happening sequentially key by key in ozone.
Thanks adoroszlai for helping out discovering this.
cc ~ ljain msingh szetszwo jnpandey
Similar fix may be required in GrpCLogAppender as well it uses the same TimeoutScheduler.
Attachments
Attachments
Issue Links
- blocks
-
HDDS-2331 Client OOME due to buffer retention
- Resolved
- breaks
-
RATIS-732 TestRaftAsyncExceptionWithGrpc.testTimeoutException times out after 100s
- Resolved
-
RATIS-733 TestRaftOutputStreamWithGrpc.testSimpleWrite times out after 30s
- Resolved
-
RATIS-734 TestRaftServerWithGrpc.testRaftClientMetrics times out after 100s
- Resolved
-
RATIS-704 Invoke sendAsync as soon as OrderedAsync is created
- Resolved