[HDDS-8173] SchemaV3 RocksDB entries are not removed after container delete - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: db
Labels:
- pull-request-available

Target Version/s:

1.4.0, 1.3.1

Description

After deleting a container, all RocksDB entries for that container should be deleted from RocksDB. Metadata and block data are still on the DB intact.

Problem appears to stem from the call to

BlockUtils.removeContainerFromDB(containerData, conf)

and does not clear entries in datanode SchemaV3 rocksDb for the given container id.

We can reproduce this issue on a docker cluster as follows:

start a docker cluster with 5 datanodes
put a key under a bucket to create a container
close the container
put 2 datanodes that the container has replicas, on decommission
recommission the datanodes
container should be over-replicated
ReplicationManager should issue a container delete for 2 datanodes
Check one of the two datanodes
Container should be deleted
Check RocksDB block data entries for the container

on master, ozone root

❯ cd hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone

edit docker-config and add below two configs needed for decommission

OZONE-SITE.XML_ozone.scm.nodes.scmservice=scm
OZONE-SITE.XML_ozone.scm.address.scmservice.scm=scm

start Ozone cluster with 5 datanodes, connect to scm and create a key

❯ docker-compose up --scale datanode=5 -d

❯ docker exec -it ozone_scm_1 bash
bash-4.2$ ozone sh volume create /vol1   
bash-4.2$ ozone sh bucket create /vol1/bucket1
bash-4.2$ ozone sh key put /vol1/bucket1/key1 /etc/hosts

close the container and check on which datanodes it's on

bash-4.2$ ozone admin container close 1
bash-4.2$ ozone admin container info 1 
...

check scm roles, to get scm IP and port

bash-4.2$ ozone admin scm roles
99960cfeda73:9894:LEADER:62393063-a1e0-4d5e-bcf5-938cf09a9511:172.25.0.4

check datanode list, to get IP and hostname for 2 datanodes the container is on

bash-4.2$ ozone admin datanode list
...

place both datanodes on decommission

bash-4.2$ ozone admin datanode decommission -id=scmservice --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname>

wait until both datanodes are decommissioned, at that point if we check the container's info we can see that it has replicas placed upon other datanodes as well

recommission both datanodes

bash-4.2$ ozone admin datanode recommission -id=scmservice --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname>

After a few minutes, on scm logs

2023-03-15 18:24:53,810 [ReplicationMonitor] INFO replication.LegacyReplicationManager: Container #1 is over replicated. Expected replica count is 3, but found 5.
2023-03-15 18:24:53,810 [ReplicationMonitor] INFO replication.LegacyReplicationManager: Sending delete container command for container #1 to datanode d6461c13-c2fa-4437-94f5-f75010a49069(ozone_datanode_2.ozone_default/172.25.0.11)
2023-03-15 18:24:53,811 [ReplicationMonitor] INFO replication.LegacyReplicationManager: Sending delete container command for container #1 to datanode 6b077eea-543b-47ca-abf2-45f26c106903(ozone_datanode_5.ozone_default/172.25.0.6)

connect to one of the datanodes, the container is being deleted

check that the container is deleted

bash-4.2$ ls /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/current/containerDir0/
bash-4.2$

check RocksDB

bash-4.2$ ozone debug ldb --db /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/DS-a8a72696-e4cf-42a6-a66c-04f0b614fde4/container.db scan --column-family=block_data

Block data for the deleted container are still there

  "blockID": {
    "containerBlockID": {
      "containerID": 1,
      "localID": 111677748019200001

metadata and block_data still have the entries while deleted_blocks and delete_txns are empty.

I've also attached a diff with a test added under TestContainerPersistence, that verifies above issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

rocksDBContainerDelete.diff
16/Mar/23 08:19
4 kB
Christos Bisias

Issue Links

links to

GitHub Pull Request #4445

Activity

People

Assignee:: Hemant Kumar

Reporter:: Christos Bisias

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 15/Mar/23 20:06

Updated:: 24/Mar/23 20:34

Resolved:: 22/Mar/23 10:21