[HDDS-8417] Cap on the queue of commands at DN - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Implemented
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: None
Labels:
- proton
- pull-request-available

Description

When command is operating very slowly, its observed that queue is getting pilled up. In one of environment, below are queued up causing usages of approx 5GB memory,

Block delete command: 2.8k
Close Container command: 13k
Close pipeline command: 2390k
replicate container command: 57k

This is happening when disk is almost full and command is running very slow.

SCM keeps sending same command every heartbeat.

So a cap on queue is required and need reject further command accumulation. This needs to be done for each command type queue based on memory occupancy and repeating of command from SCM.

Command size pattern:

DeleteBlockCommand: 1.7MB (having 5804 containers)

ClosePipelineCommand: 130 bytes

Create Pipeline: 3.3KB (DN Info: 1KB * 3 DN)

Replicate Container: 1.2KB (DN Info 1KB)

CloseContainer: 1.2KB (Encoded token 1KB)

DeleteContainer: 100 bytes

DeleteBlockCommand: triggered by SCM at every 5 min default, and is repeated again if some blocks response has not come. So this operation can be retried again when ignored by DN or do not send.

A special handling with max queue size: 5 should be enough.

ClosePipeline/CloseContainer/DeleteContainer: retried by SCM again for every container/pipeline, so queue size: 5000 should be enough.

CreatePipeline/ReplicateContainer is controlled by SCM which is not retried / repeated, so can support 5000 as queue size

Others can follow similar default size as 5000 as CAP.

Attachments

Issue Links

is a child of

HDDS-8299 Disk full situation on a leader DN may result in followers getting stuck in a retry loop

Resolved

links to

GitHub Pull Request #4618

Activity

People

Assignee:: Sumit Agrawal

Reporter:: Sumit Agrawal

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 13/Apr/23 12:33

Updated:: 23/Aug/23 17:34

Resolved:: 27/Apr/23 14:38