[HDDS-2592] Add Datanode command to allow the datanode to persist its admin state - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.5.0
Fix Version/s: None
Component/s: Ozone Datanode, SCM
Labels:
- pull-request-available

Target Version/s:

0.5.0

Description

When the operational state of a datanode changes, an async command should be triggered to persist the new state on the datanodes. For maintenance mode, the datanode should also store the maintenance end time. The datanode will then report the new state (and optional maintenance end time) back via its heartbeat.

The purpose of the DN persisting this information and heartbeating it back to SCM is to allow the operation state to be recovered after a SCM reboot, as SCM does not persist any of this information. It also allows "Recon" to learn the datanode states.

If SCM is restarted, then it will forget all knowledge of the datanodes. When they register, their operational state will be reported and SCM can set it correctly.

Outside of registration (ie during normal heartbeats), the SCM state is the source of truth for the operational state and if the DN heartbeat reports a state that is not the same as SCM, SCM should issue another command to the datanode to set its state to the SCM value. There is a chance the state miss match is due to an unprocessed command triggered by the SCM state change, but the worst case is an extra command sent to the datanode. This is a very lightweight command, so that is not an issue.

One open question is whether to persist intermediate states on the DN. Ie for decommissioning, the DN will first persist "Decommissioning" and then transition to "Decommissioned" when SCM is satisfied all containers are replicated. It would be possible to persist both these states in turn on the datanode quite easily in turn. Or, we set the end state (Decommissioned) on the datanode and allow SCM to get the node to that state. For the latter, if SCM is restarted, then the DN will report "Decommissioned" on registration, but SCM will set its internal state to Decommissioning and then ensure all containers are replicated before transitioning the node to Decommissioned. This seems like a safer approach, but there are advantages of tracking the intermediate states on the DNs too.

Attachments

Issue Links

links to

GitHub Pull Request #521

Activity

People

Assignee:: Stephen O'Donnell

Reporter:: Stephen O'Donnell

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Nov/19 22:18

Updated:: 31/Mar/20 21:59

Resolved:: 31/Mar/20 21:59

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m