[HDDS-10239] Storage Container Reconciliation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Ozone Datanode, SCM
Labels:
- pull-request-available

Description

Ideally, a healthy Ozone cluster would contain only open and closed containers. However, container replicas commonly end up with a mix of states including quasi-closed and unhealthy that the current system is not able to resolve to cleanly closed replicas. The cause of these states is often bugs or broad failure handling on the write path. While we should fix these causes, they raise the problem that Ozone is not able to reconcile these mismatched container states on its own, regardless of their cause. This has lead to significant complexity in the replication manager for how to handle cases where only quasi-closed and unhealthy replicas are available, especially in the case of decommissioning.

Even when all replicas are closed, the system assumes that these closed container replicas are equal with no way to verify this. Checksumming is done for individual chunks within each container, but if two container replicas somehow end up with chunks that differ in length or content despite being marked closed with local checksums matching, the system has no way to detect or resolve this anomaly.

This Jira proposes a container reconciliation protocol to solve these problems. After implementing the proposal:
1. It should be possible for a cluster to progress to a state where it has only properly replicated closed and open containers.
2. We can verify the equality and integrity of all closed containers.

The design doc is linked here as a markdown pull request for inline comments.

Attachments

Issue Links

is duplicated by

HDDS-9280 Quasi-closed container with unhealthy replicas may remain under-replicated in 4 node cluster

Resolved

relates to

HDDS-10931 Schedule on demand scan of containers after import

Open

HDDS-11593 Improve container scanner metrics

In Progress

HDDS-7094 Enable Datanode side CRC checks by default

Reopened

links to

GitHub Pull Request #6121

Sub-Tasks

1.	Implement container comparison and repair logic within datanodes	Patch Available	Aswin Shakil
2.	Add new tests for container scanner detecting multiple errors in one container	Patch Available	Ethan Rose
3.	Make container scanner generate merkle trees during the scan	Open	Ethan Rose
4.	Coordinate container reconciliation with container deletion and replication	Open	Unassigned
5.	Allow reconciliation and scanner to move replicas out of the UNHEALTHY state	Open	Aswin Shakil
6.	Handle backwards compatibility for containers created before reconciliation	Open	Unassigned
7.	Consider allowing reconciliation when not all replicas have reached closed state	Open	Unassigned
8.	SCMExceptions resulting from admin CLI commands are treated as retriable	Open	Unassigned
9.	Restrict reconciliation requests by datanode status	Open	Unassigned
10.	Extend container repair capabilities to the block level	Open	Unassigned
11.	Combine datanode clients for reconciliation and EC reconstruction	Open	Unassigned
12.	Basic SCM co-ordination	Open	Unassigned
13.	Optimize checksum calculations in container merkle tree	Open	Ritesh Shukla
14.	Add metrics specific to reconciliation tasks	Open	Unassigned
15.	Use zero-copy for readMerkleTree API	Open	Unassigned

Activity

People

Assignee:: Ethan Rose

Reporter:: Ethan Rose

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 30/Jan/24 01:37

Updated:: 17/Oct/24 15:20