Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Implemented
-
None
-
None
Description
When command is operating very slowly, its observed that queue is getting pilled up. In one of environment, below are queued up causing usages of approx 5GB memory,
- Block delete command: 2.8k
- Close Container command: 13k
- Close pipeline command: 2390k
- replicate container command: 57k
This is happening when disk is almost full and command is running very slow.
SCM keeps sending same command every heartbeat.
So a cap on queue is required and need reject further command accumulation. This needs to be done for each command type queue based on memory occupancy and repeating of command from SCM.
Command size pattern:
DeleteBlockCommand: 1.7MB (having 5804 containers)
ClosePipelineCommand: 130 bytes
Create Pipeline: 3.3KB (DN Info: 1KB * 3 DN)
Replicate Container: 1.2KB (DN Info 1KB)
CloseContainer: 1.2KB (Encoded token 1KB)
DeleteContainer: 100 bytes
DeleteBlockCommand: triggered by SCM at every 5 min default, and is repeated again if some blocks response has not come. So this operation can be retried again when ignored by DN or do not send.
A special handling with max queue size: 5 should be enough.
ClosePipeline/CloseContainer/DeleteContainer: retried by SCM again for every container/pipeline, so queue size: 5000 should be enough.
CreatePipeline/ReplicateContainer is controlled by SCM which is not retried / repeated, so can support 5000 as queue size
Others can follow similar default size as 5000 as CAP.
Attachments
Issue Links
- is a child of
-
HDDS-8299 Disk full situation on a leader DN may result in followers getting stuck in a retry loop
- Resolved
- links to