Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-9146

Potential data loss with HSync due to deletedTable entry having the same block as keyTable entry's

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.4.0
    • None

    Description

      It is observed when hsync() is called followed by a close() for a key stream (which triggers two OMKeyCommitRequest, the first one with isHSync = true and the second one with isHSync = false), deletedTable could have an entry with the exact same block conID (container ID) and locId (local ID) as the committed key in keyTable, which can cause OM's KeyDeletingService to call SCM to remove the committed block by mistake.

      The catch is, actual data loss won't happen until the container is closed, only then will block deletion actually happen on DNs. CMIIW erose

      Repro integration test branch (based on erose's integration test based on my initial draft):

      https://github.com/smengcl/hadoop-ozone/tree/HDDS-9146-repro

      Run integration test TestMiniOzoneCluster#testKeyRenameDirDelete for a repro:

      Test log. See entries in keyTable and deletedTable share the same block conID: 1 and locID: 111677748019200001
      2023-08-09 14:31:54,859 [main] WARN  ozone.TestMiniOzoneCluster (TestMiniOzoneCluster.java:testKeyRenameDirDelete(159)) - keyTable:     ----- START -----
      2023-08-09 14:31:54,860 [main] WARN  ozone.TestMiniOzoneCluster (TestMiniOzoneCluster.java:testKeyRenameDirDelete(168)) - keyTable:     key = /testozonevol/testozonebucket/inputTera/_temporary/1/_temporary/attempt_1691047336995_0006_m_000001_0/part-m-00001, val = OmKeyInfo{volumeName='testozonevol', bucketName='testozonebucket', keyName='inputTera/_temporary/1/_temporary/attempt_1691047336995_0006_m_000001_0/part-m-00001', dataSize=11, keyLocationVersions=[OmKeyLocationInfoGroup{version=0, locationVersionMap={0=[{blockID={conID: 1 locID: 111677748019200001 bcsId: 2}, length=11, offset=0, token=null, pipeline=null, createVersion=0, partNumber=0}]}, isMultipartKey=false}], creationTime=1691616714661, modificationTime=1691616714848, replicationConfig=RATIS/THREE, encInfo=null, fileChecksum=null, isFile=true, fileName='part-m-00001'}
      2023-08-09 14:31:54,860 [main] WARN  ozone.TestMiniOzoneCluster (TestMiniOzoneCluster.java:testKeyRenameDirDelete(171)) - keyTable:     -----  END  -----
      2023-08-09 14:31:54,860 [main] WARN  ozone.TestMiniOzoneCluster (TestMiniOzoneCluster.java:testKeyRenameDirDelete(173)) - deletedTable: ----- START -----
      2023-08-09 14:31:54,861 [main] WARN  ozone.TestMiniOzoneCluster (TestMiniOzoneCluster.java:testKeyRenameDirDelete(181)) - deletedTable: key = /testozonevol/testozonebucket/inputTera/_temporary/1/_temporary/attempt_1691047336995_0006_m_000001_0/part-m-00001/-9223372036854774528, val = RepeatedOmKeyInfo{omKeyInfoList=[OmKeyInfo{volumeName='testozonevol', bucketName='testozonebucket', keyName='inputTera/_temporary/1/_temporary/attempt_1691047336995_0006_m_000001_0/part-m-00001', dataSize=11, keyLocationVersions=[OmKeyLocationInfoGroup{version=0, locationVersionMap={0=[{blockID={conID: 1 locID: 111677748019200001 bcsId: 0}, length=11, offset=0, token=null, pipeline=null, createVersion=0, partNumber=0}]}, isMultipartKey=false}], creationTime=1691616714661, modificationTime=1691616714834, replicationConfig=RATIS/THREE, encInfo=null, fileChecksum=null, isFile=true, fileName='part-m-00001'}]}
      2023-08-09 14:31:54,861 [main] WARN  ozone.TestMiniOzoneCluster (TestMiniOzoneCluster.java:testKeyRenameDirDelete(184)) - deletedTable: -----  END  -----
      

      Sounds to me the fix should be to filter out any block that shares the same containerId and locId as the keyTable/fileTable entry when adding to deletedTable inside OMKeyCommitRequest / OMKeyCommitRequestWithFSO. But I'm no expert in HSync so please advise. cc weichiu szetszwo

      Attachments

        Issue Links

          Activity

            People

              smeng Siyao Meng
              smeng Siyao Meng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: