Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
A container can get stuck in the DELETING state in the following scenario:
Shutdown a node and write an EC key shortly afterwards before the node has been marked stale. In this case, the client can do the following:
1. Attempt to write to a pipeline that has the now stopped node.
2. Fails to write the stripe and abandons it. The client will seemlessly rewrite the stripe to a new container.
3. This could be a new container with no other data. Now it will have some chunk written to some of the replicas, but no put block will have arrived.
4. On the DN, the rocks DB will not have an entry for this block, but there will be a block file on disk. The DN will therefore refuse to delete the container if it is asked.
5. The stale node handler triggers the close of the pipeline. The replicas are reported to SCM with zero bytes and zero keys and hence the container goes to DELETING after it is CLOSED.
6. SCM sends delete commands which are dropped by the DN due to the chunks written and the container is stuck deleting forever.
The following logs show this happening. The key should have gone into container 1 as this was a fresh cluster with no previous writes, but node 6 was stopped. You can see the logs for the pipeline getting closed including node_6 for container 1:
[36mscm_1 [0m 2023-06-13 10:56:07,420 [EventQueue-StaleNodeForStaleNodeHandler] INFO node.StaleNodeHandler: Datanode 7c8bf940-3520-41e4-a1b3-672359713c5b(ozone_datanode_6.ozone_default/172.30.0.12) moved to stale state. Finalizing its pipelines [PipelineID=2f228c83-dad4-40cd-8415-b153564738de, PipelineID=59d63d29-64f1-4a95-94cc-e75c65abb94b, PipelineID=dac2f9a5-f274-4baa-9409-3d203b794905] [36mscm_1 [0m 2023-06-13 10:56:07,435 [EventQueue-StaleNodeForStaleNodeHandler] INFO pipeline.PipelineManagerImpl: Container #1 closed for pipeline=PipelineID=2f228c83-dad4-40cd-8415-b153564738de [36mscm_1 [0m 2023-06-13 10:56:07,437 [EventQueue-CloseContainerForCloseContainerEventHandler] INFO container.CloseContainerEventHandler: Close container Event triggered for container : #1, current state: CLOSING
Container 1 has moved to CLOSING, then we see it transiton to closed and then deleting:
[36mscm_1 [0m 2023-06-13 10:56:16,091 [FixedThreadPoolWithAffinityExecutor-9-0] INFO container.IncrementalContainerReportHandler: Moving container #1 to CLOSED state, datanode 9c7eab65-20fb-49b6-85cb-e0d4df91b14e(ozone_datanode_3.ozone_default/172.30.0.3) reported CLOSED replica with index 5. [36mscm_1 [0m 2023-06-13 10:56:28,492 [ReplicationMonitor] INFO replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 1, replicaIndex: 5, force: false] for container ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2023-06-13T10:56:16.095763Z, pipelineID=PipelineID=2f228c83-dad4-40cd-8415-b153564738de, owner=om1} to 9c7eab65-20fb-49b6-85cb-e0d4df91b14e(ozone_datanode_3.ozone_default/172.30.0.3) with datanode deadline 1686653848492 and scm deadline 1686653878492 [36mscm_1 [0m 2023-06-13 10:56:28,493 [ReplicationMonitor] INFO replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 1, replicaIndex: 4, force: false] for container ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2023-06-13T10:56:16.095763Z, pipelineID=PipelineID=2f228c83-dad4-40cd-8415-b153564738de, owner=om1} to 02968363-81b4-44b9-bd9b-bfb03eb96193(ozone_datanode_1.ozone_default/172.30.0.10) with datanode deadline 1686653848493 and scm deadline 1686653878493
Getting the key info, We can see the key is actually in container 2, so what likely happened, is that the client tried to write the datanode_6 and the parity on datanodes 3 and 1. The stripe got abandoned and re-written to container 2 and hence the put block to update the length likely never made it onto the parity nodes, but some block data did. Now the DNs will not delete the container as it is not empty and it remains stuck in DELETING forever.
ozone sh key info /vol1/bucket1/key1 { "volumeName" : "vol1", "bucketName" : "bucket1", "name" : "key1", "dataSize" : 175, "creationTime" : "2023-06-13T10:56:04.158Z", "modificationTime" : "2023-06-13T10:56:35.186Z", "replicationConfig" : { "data" : 3, "parity" : 2, "ecChunkSize" : 1048576, "codec" : "RS", "replicationType" : "EC", "requiredNodes" : 5 }, "metadata" : { }, "ozoneKeyLocations" : [ { "containerID" : 2, "localID" : 111677748019200002, "length" : 175, "offset" : 0, "keyOffset" : 0 } ] }
Checking of the DN storage directories:
bash-4.2$ pwd /data/hdds/hdds/CID-684c8137-b002-47d8-b134-7831df54e517/current/containerDir0/1 bash-4.2$ ls -al chunks/ total 12 drwxr-xr-x 2 hadoop users 4096 Jun 13 10:56 . drwxr-xr-x 4 hadoop users 4096 Jun 13 10:56 .. -rw-r--r-- 1 hadoop users 175 Jun 13 10:56 111677748019200001.block
Dumped the rocksDB with:
ozone debug ldb --db=container.db scan --column_family=block_data -l=-1
And the blockID for the file above is not present, so the container is "empty" with an orphan block in it.
Attachments
Issue Links
- causes
-
HDDS-10105 Remove duplicate hdds.datanode. prefix from check.empty.container.dir.on.delete
- Resolved
- is caused by
-
HDDS-8118 Check if container directory is empty before deleting container
- Resolved
- relates to
-
HDDS-8117 Move block files to container directory after the first put block
- Open
-
HDDS-8956 Intermittent timeout in TestDeleteContainerHandler
- Resolved
- links to