Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8173

SchemaV3 RocksDB entries are not removed after container delete

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.4.0
    • db

    Description

      After deleting a container, all RocksDB entries for that container should be deleted from RocksDB. Metadata and block data are still on the DB intact.

       

      Problem appears to stem from the call to 

      BlockUtils.removeContainerFromDB(containerData, conf)

      and does not clear entries in datanode SchemaV3 rocksDb for the given container id.

       

      We can reproduce this issue on a docker cluster as follows: 

      • start a docker cluster with 5 datanodes
      • put a key under a bucket to create a container
      • close the container
      • put 2 datanodes that the container has replicas, on decommission
      • recommission the datanodes
      • container should be over-replicated
      • ReplicationManager should issue a container delete for 2 datanodes
      • Check one of the two datanodes
      • Container should be deleted
      • Check RocksDB block data entries for the container 

       

      on master, ozone root

       

      ❯ cd hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone 

      edit docker-config and add below two configs needed for decommission

       

       

      OZONE-SITE.XML_ozone.scm.nodes.scmservice=scm
      OZONE-SITE.XML_ozone.scm.address.scmservice.scm=scm 

      start Ozone cluster with 5 datanodes, connect to scm and create a key

       

      ❯ docker-compose up --scale datanode=5 -d
      
      ❯ docker exec -it ozone_scm_1 bash
      bash-4.2$ ozone sh volume create /vol1   
      bash-4.2$ ozone sh bucket create /vol1/bucket1
      bash-4.2$ ozone sh key put /vol1/bucket1/key1 /etc/hosts

      close the container and check on which datanodes it's on

      bash-4.2$ ozone admin container close 1
      bash-4.2$ ozone admin container info 1 
      ...
      

      check scm roles, to get scm IP and port

       

      bash-4.2$ ozone admin scm roles
      99960cfeda73:9894:LEADER:62393063-a1e0-4d5e-bcf5-938cf09a9511:172.25.0.4 

       

      check datanode list, to get IP and hostname for 2 datanodes the container is on

       

      bash-4.2$ ozone admin datanode list
      ... 

      place both datanodes on decommission

       

       

      bash-4.2$ ozone admin datanode decommission -id=scmservice --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname> 

      wait until both datanodes are decommissioned, at that point if we check the container's info we can see that it has replicas placed upon other datanodes as well

       

       

      recommission both datanodes

       

      bash-4.2$ ozone admin datanode recommission -id=scmservice --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname>  

      After a few minutes, on scm logs

       

      2023-03-15 18:24:53,810 [ReplicationMonitor] INFO replication.LegacyReplicationManager: Container #1 is over replicated. Expected replica count is 3, but found 5.
      2023-03-15 18:24:53,810 [ReplicationMonitor] INFO replication.LegacyReplicationManager: Sending delete container command for container #1 to datanode d6461c13-c2fa-4437-94f5-f75010a49069(ozone_datanode_2.ozone_default/172.25.0.11)
      2023-03-15 18:24:53,811 [ReplicationMonitor] INFO replication.LegacyReplicationManager: Sending delete container command for container #1 to datanode 6b077eea-543b-47ca-abf2-45f26c106903(ozone_datanode_5.ozone_default/172.25.0.6) 

      connect to one of the datanodes, the container is being deleted

       

      check that the container is deleted

      bash-4.2$ ls /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/current/containerDir0/
      bash-4.2$  

      check RocksDB

      bash-4.2$ ozone debug ldb --db /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/DS-a8a72696-e4cf-42a6-a66c-04f0b614fde4/container.db scan --column-family=block_data 

      Block data for the deleted container are still there

        "blockID": {
          "containerBlockID": {
            "containerID": 1,
            "localID": 111677748019200001 

      metadata and block_data still have the entries while deleted_blocks and delete_txns are empty.

       

      I've also attached a diff with a test added under TestContainerPersistence, that verifies above issue.

      Attachments

        1. rocksDBContainerDelete.diff
          4 kB
          Christos Bisias

        Issue Links

          Activity

            People

              hemantk Hemant Kumar
              xbis Christos Bisias
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: