Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7759 Improve Ozone Replication Manager
  3. HDDS-8838

Orphan blocks can leave empty container stuck deleting forever

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None

    Description

      A container can get stuck in the DELETING state in the following scenario:

      Shutdown a node and write an EC key shortly afterwards before the node has been marked stale. In this case, the client can do the following:

      1. Attempt to write to a pipeline that has the now stopped node.

      2. Fails to write the stripe and abandons it. The client will seemlessly rewrite the stripe to a new container.

      3. This could be a new container with no other data. Now it will have some chunk written to some of the replicas, but no put block will have arrived.

      4. On the DN, the rocks DB will not have an entry for this block, but there will be a block file on disk. The DN will therefore refuse to delete the container if it is asked.

      5. The stale node handler triggers the close of the pipeline. The replicas are reported to SCM with zero bytes and zero keys and hence the container goes to DELETING after it is CLOSED.

      6. SCM sends delete commands which are dropped by the DN due to the chunks written and the container is stuck deleting forever.

      The following logs show this happening. The key should have gone into container 1 as this was a fresh cluster with no previous writes, but node 6 was stopped. You can see the logs for the pipeline getting closed including node_6 for container 1:

      [36mscm_1       [0m 2023-06-13 10:56:07,420 [EventQueue-StaleNodeForStaleNodeHandler] INFO node.StaleNodeHandler: Datanode 7c8bf940-3520-41e4-a1b3-672359713c5b(ozone_datanode_6.ozone_default/172.30.0.12) moved to stale state. Finalizing its pipelines [PipelineID=2f228c83-dad4-40cd-8415-b153564738de, PipelineID=59d63d29-64f1-4a95-94cc-e75c65abb94b, PipelineID=dac2f9a5-f274-4baa-9409-3d203b794905]
      [36mscm_1       [0m 2023-06-13 10:56:07,435 [EventQueue-StaleNodeForStaleNodeHandler] INFO pipeline.PipelineManagerImpl: Container #1 closed for pipeline=PipelineID=2f228c83-dad4-40cd-8415-b153564738de
      [36mscm_1       [0m 2023-06-13 10:56:07,437 [EventQueue-CloseContainerForCloseContainerEventHandler] INFO container.CloseContainerEventHandler: Close container Event triggered for container : #1, current state: CLOSING
      

      Container 1 has moved to CLOSING, then we see it transiton to closed and then deleting:

      [36mscm_1       [0m 2023-06-13 10:56:16,091 [FixedThreadPoolWithAffinityExecutor-9-0] INFO container.IncrementalContainerReportHandler: Moving container #1 to CLOSED state, datanode 9c7eab65-20fb-49b6-85cb-e0d4df91b14e(ozone_datanode_3.ozone_default/172.30.0.3) reported CLOSED replica with index 5.
      [36mscm_1       [0m 2023-06-13 10:56:28,492 [ReplicationMonitor] INFO replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 1, replicaIndex: 5, force: false] for container ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2023-06-13T10:56:16.095763Z, pipelineID=PipelineID=2f228c83-dad4-40cd-8415-b153564738de, owner=om1} to 9c7eab65-20fb-49b6-85cb-e0d4df91b14e(ozone_datanode_3.ozone_default/172.30.0.3) with datanode deadline 1686653848492 and scm deadline 1686653878492
      [36mscm_1       [0m 2023-06-13 10:56:28,493 [ReplicationMonitor] INFO replication.ReplicationManager: Sending command [deleteContainerCommand: containerID: 1, replicaIndex: 4, force: false] for container ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2023-06-13T10:56:16.095763Z, pipelineID=PipelineID=2f228c83-dad4-40cd-8415-b153564738de, owner=om1} to 02968363-81b4-44b9-bd9b-bfb03eb96193(ozone_datanode_1.ozone_default/172.30.0.10) with datanode deadline 1686653848493 and scm deadline 1686653878493
      

      Getting the key info, We can see the key is actually in container 2, so what likely happened, is that the client tried to write the datanode_6 and the parity on datanodes 3 and 1. The stripe got abandoned and re-written to container 2 and hence the put block to update the length likely never made it onto the parity nodes, but some block data did. Now the DNs will not delete the container as it is not empty and it remains stuck in DELETING forever.

      ozone sh key info /vol1/bucket1/key1
      {
        "volumeName" : "vol1",
        "bucketName" : "bucket1",
        "name" : "key1",
        "dataSize" : 175,
        "creationTime" : "2023-06-13T10:56:04.158Z",
        "modificationTime" : "2023-06-13T10:56:35.186Z",
        "replicationConfig" : {
          "data" : 3,
          "parity" : 2,
          "ecChunkSize" : 1048576,
          "codec" : "RS",
          "replicationType" : "EC",
          "requiredNodes" : 5
        },
        "metadata" : { },
        "ozoneKeyLocations" : [ {
          "containerID" : 2,
          "localID" : 111677748019200002,
          "length" : 175,
          "offset" : 0,
          "keyOffset" : 0
        } ]
      }
      

      Checking of the DN storage directories:

      bash-4.2$ pwd
      /data/hdds/hdds/CID-684c8137-b002-47d8-b134-7831df54e517/current/containerDir0/1
      bash-4.2$ ls -al chunks/
      total 12
      drwxr-xr-x 2 hadoop users 4096 Jun 13 10:56 .
      drwxr-xr-x 4 hadoop users 4096 Jun 13 10:56 ..
      -rw-r--r-- 1 hadoop users  175 Jun 13 10:56 111677748019200001.block
      

      Dumped the rocksDB with:

      ozone debug ldb --db=container.db scan --column_family=block_data -l=-1
      

      And the blockID for the file above is not present, so the container is "empty" with an orphan block in it.

      Attachments

        Issue Links

          Activity

            People

              ashishk Ashish Kumar
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: