[HDDS-8838] Orphan blocks can leave empty container stuck deleting forever - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: None
Labels:
- pull-request-available

Description

A container can get stuck in the DELETING state in the following scenario:

Shutdown a node and write an EC key shortly afterwards before the node has been marked stale. In this case, the client can do the following:

1. Attempt to write to a pipeline that has the now stopped node.

2. Fails to write the stripe and abandons it. The client will seemlessly rewrite the stripe to a new container.

3. This could be a new container with no other data. Now it will have some chunk written to some of the replicas, but no put block will have arrived.

4. On the DN, the rocks DB will not have an entry for this block, but there will be a block file on disk. The DN will therefore refuse to delete the container if it is asked.

5. The stale node handler triggers the close of the pipeline. The replicas are reported to SCM with zero bytes and zero keys and hence the container goes to DELETING after it is CLOSED.

6. SCM sends delete commands which are dropped by the DN due to the chunks written and the container is stuck deleting forever.

The following logs show this happening. The key should have gone into container 1 as this was a fresh cluster with no previous writes, but node 6 was stopped. You can see the logs for the pipeline getting closed including node_6 for container 1:

[36mscm_1       [0m 2023-06-13 10:56:07,420 [EventQueue-StaleNodeForStaleNodeHandler] INFO node.StaleNodeHandler: Datanode 7c8bf940-3520-41e4-a1b3-672359713c5b(ozone_datanode_6.ozone_default/172.30.0.12) moved to stale state. Finalizing its pipelines [PipelineID=2f228c83-dad4-40cd-8415-b153564738de, PipelineID=59d63d29-64f1-4a95-94cc-e75c65abb94b, PipelineID=dac2f9a5-f274-4baa-9409-3d203b794905]
[36mscm_1       [0m 2023-06-13 10:56:07,435 [EventQueue-StaleNodeForStaleNodeHandler] INFO pipeline.PipelineManagerImpl: Container #1 closed for pipeline=PipelineID=2f228c83-dad4-40cd-8415-b153564738de
[36mscm_1       [0m 2023-06-13 10:56:07,437 [EventQueue-CloseContainerForCloseContainerEventHandler] INFO container.CloseContainerEventHandler: Close container Event triggered for container : #1, current state: CLOSING

Container 1 has moved to CLOSING, then we see it transiton to closed and then deleting:

[36mscm_1       [0m 2023-06-13 10:56:16,091 [FixedThreadPoolWithAffinityExecutor-9-0] INFO container.IncrementalContainerReportHandler: Moving container #1 to CLOSED state, datanode 9c7eab65-20fb-49b6-85cb-e0d4df91b14e(ozone_datanode_3.ozone_default/172.30.0.3) reported CLOSED replica with index 5.
[36mscm_1       [0m 2023-06-13 10:56:28,492 [ReplicationMonitor] INFO replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 1, replicaIndex: 5, force: false] for container ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2023-06-13T10:56:16.095763Z, pipelineID=PipelineID=2f228c83-dad4-40cd-8415-b153564738de, owner=om1} to 9c7eab65-20fb-49b6-85cb-e0d4df91b14e(ozone_datanode_3.ozone_default/172.30.0.3) with datanode deadline 1686653848492 and scm deadline 1686653878492
[36mscm_1       [0m 2023-06-13 10:56:28,493 [ReplicationMonitor] INFO replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 1, replicaIndex: 4, force: false] for container ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2023-06-13T10:56:16.095763Z, pipelineID=PipelineID=2f228c83-dad4-40cd-8415-b153564738de, owner=om1} to 02968363-81b4-44b9-bd9b-bfb03eb96193(ozone_datanode_1.ozone_default/172.30.0.10) with datanode deadline 1686653848493 and scm deadline 1686653878493

Getting the key info, We can see the key is actually in container 2, so what likely happened, is that the client tried to write the datanode_6 and the parity on datanodes 3 and 1. The stripe got abandoned and re-written to container 2 and hence the put block to update the length likely never made it onto the parity nodes, but some block data did. Now the DNs will not delete the container as it is not empty and it remains stuck in DELETING forever.

ozone sh key info /vol1/bucket1/key1
{
  "volumeName" : "vol1",
  "bucketName" : "bucket1",
  "name" : "key1",
  "dataSize" : 175,
  "creationTime" : "2023-06-13T10:56:04.158Z",
  "modificationTime" : "2023-06-13T10:56:35.186Z",
  "replicationConfig" : {
    "data" : 3,
    "parity" : 2,
    "ecChunkSize" : 1048576,
    "codec" : "RS",
    "replicationType" : "EC",
    "requiredNodes" : 5
  },
  "metadata" : { },
  "ozoneKeyLocations" : [ {
    "containerID" : 2,
    "localID" : 111677748019200002,
    "length" : 175,
    "offset" : 0,
    "keyOffset" : 0
  } ]
}

Checking of the DN storage directories:

bash-4.2$ pwd
/data/hdds/hdds/CID-684c8137-b002-47d8-b134-7831df54e517/current/containerDir0/1
bash-4.2$ ls -al chunks/
total 12
drwxr-xr-x 2 hadoop users 4096 Jun 13 10:56 .
drwxr-xr-x 4 hadoop users 4096 Jun 13 10:56 ..
-rw-r--r-- 1 hadoop users  175 Jun 13 10:56 111677748019200001.block

Dumped the rocksDB with:

ozone debug ldb --db=container.db scan --column_family=block_data -l=-1

And the blockID for the file above is not present, so the container is "empty" with an orphan block in it.

Attachments

Issue Links

causes

HDDS-10105 Remove duplicate hdds.datanode. prefix from check.empty.container.dir.on.delete

Resolved

is caused by

HDDS-8118 Check if container directory is empty before deleting container

Resolved

relates to

HDDS-8117 Move block files to container directory after the first put block

Open

HDDS-8956 Intermittent timeout in TestDeleteContainerHandler

Resolved

links to

GitHub Pull Request #4937

Orphan blocks can leave empty container stuck deleting forever

Details

Description

Attachments

Issue Links

Activity

People

Dates