Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-9043

[snapshot] Distcp throws DuplicateFileException when files are deleted in source directory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Ozone Manager

    Description

      Steps :

      1. Create source vol/buck/key
      2. Create destination vol/buck
      3. Run base replication distcp from source to destination
      4. Create snapshot snap1 on both source and destination dirs
      5. Delete key from source bucket and create snapshot snap2
      6. Run snapshot distcp from source to destination bucket with snap1 snap2

      Filesystem after step 3 -

      [root@quasar-vebabo-1 ~]# ozone fs -ls -R ofs://ozone1/vola*
      drwxrwxrwx   - systest systest          0 2023-07-19 07:19 ofs://ozone1/vola1/bucka1
      -rw-rw-rw-   3 systest systest        672 2023-07-19 07:19 ofs://ozone1/vola1/bucka1/key1
      drwxrwxrwx   - systest systest          0 2023-07-19 07:20 ofs://ozone1/vola2/bucka2
      -rw-rw-rw-   3 systest systest        672 2023-07-19 07:21 ofs://ozone1/vola2/bucka2/key1 

      Filesystem after step 5 -

      [root@quasar-vebabo-1 ~]# ozone fs -ls -R ofs://ozone1/vola*
      drwxrwxrwx   - systest systest          0 2023-07-19 07:19 ofs://ozone1/vola1/bucka1
      drwxrwxrwx   - systest systest          0 2023-07-19 07:23 ofs://ozone1/vola1/bucka1/.Trash
      drwxrwxrwx   - systest systest          0 2023-07-19 07:23 ofs://ozone1/vola1/bucka1/.Trash/systest
      drwxrwxrwx   - systest systest          0 2023-07-19 07:23 ofs://ozone1/vola1/bucka1/.Trash/systest/Current
      -rw-rw-rw-   3 systest systest        672 2023-07-19 07:19 ofs://ozone1/vola1/bucka1/.Trash/systest/Current/key1
      drwxrwxrwx   - systest systest          0 2023-07-19 07:20 ofs://ozone1/vola2/bucka2
      -rw-rw-rw-   3 systest systest        672 2023-07-19 07:21 ofs://ozone1/vola2/bucka2/key1 

      Filesystem after step 6 -

      [root@quasar-vebabo-1 ~]# ozone fs -ls -R ofs://ozone1/vola*
      drwxrwxrwx   - systest systest          0 2023-07-19 07:19 ofs://ozone1/vola1/bucka1
      drwxrwxrwx   - systest systest          0 2023-07-19 07:23 ofs://ozone1/vola1/bucka1/.Trash
      drwxrwxrwx   - systest systest          0 2023-07-19 07:23 ofs://ozone1/vola1/bucka1/.Trash/systest
      drwxrwxrwx   - systest systest          0 2023-07-19 07:23 ofs://ozone1/vola1/bucka1/.Trash/systest/Current
      -rw-rw-rw-   3 systest systest        672 2023-07-19 07:19 ofs://ozone1/vola1/bucka1/.Trash/systest/Current/key1
      drwxrwxrwx   - systest systest          0 2023-07-19 07:20 ofs://ozone1/vola2/bucka2
      drwxrwxrwx   - systest systest          0 2023-07-19 07:27 ofs://ozone1/vola2/bucka2/.Trash
      drwxrwxrwx   - systest systest          0 2023-07-19 07:27 ofs://ozone1/vola2/bucka2/.Trash/systest
      drwxrwxrwx   - systest systest          0 2023-07-19 07:27 ofs://ozone1/vola2/bucka2/.Trash/systest/Current
      -rw-rw-rw-   3 systest systest        672 2023-07-19 07:21 ofs://ozone1/vola2/bucka2/.Trash/systest/Current/key1 

      Distcp command output -

      [root@quasar-vebabo-1 ~]# hadoop distcp -update -diff snap1 snap2 ofs://ozone1/vola1/bucka1 ofs://ozone1/vola2/bucka2
      23/07/19 07:26:20 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=snap1, toSnapshot=snap2, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[ofs://ozone1/vola1/bucka1], targetPath=ofs://ozone1/vola2/bucka2, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false, directWrite=false, useiterator=false}, sourcePaths=[ofs://ozone1/vola1/bucka1], targetPathExists=true, preserveRawXattrsfalse
      23/07/19 07:27:22 INFO kms.KMSClientProvider: New token created: (Kind: kms-dt, Service: kms://https@quasar-vebabo-1.quasar-vebabo.root.hwx.site:9494/kms, Ident: (kms-dt owner=systest, renewer=yarn, realUser=, issueDate=1689751642718, maxDate=1690356442718, sequenceNumber=9, masterKeyId=2))
      23/07/19 07:27:22 INFO security.TokenCache: Got dt for ofs://ozone1; Kind: OzoneToken, Service: 172.27.128.65:9862,172.27.191.208:9862,172.27.204.65:9862, Ident: (OzoneToken owner=systest@ROOT.HWX.SITE, renewer=yarn, realUser=, issueDate=2023-07-19T07:27:22.313Z, maxDate=2023-07-26T07:27:22.313Z, sequenceNumber=5, masterKeyId=1, strToSign=null, signature=null, awsAccessKeyId=null, omServiceId=ozone1, omCertSerialId=52311743208636877)
      23/07/19 07:27:22 INFO security.TokenCache: Got dt for ofs://ozone1; Kind: kms-dt, Service: kms://https@quasar-vebabo-1.quasar-vebabo.root.hwx.site;quasar-vebabo-2.quasar-vebabo.root.hwx.site:9494/kms, Ident: (kms-dt owner=systest, renewer=yarn, realUser=, issueDate=1689751642718, maxDate=1690356442718, sequenceNumber=9, masterKeyId=2)
      23/07/19 07:27:23 INFO tools.SimpleCopyListing: Starting: Building listing using multi threaded approach for ofs://ozone1/vola1/bucka1/.snapshot/snap2
      23/07/19 07:27:23 INFO tools.SimpleCopyListing: Building listing using multi threaded approach for ofs://ozone1/vola1/bucka1/.snapshot/snap2: duration 0:00.067s
      23/07/19 07:27:23 INFO tools.SimpleCopyListing: Starting: Building listing using multi threaded approach for ofs://ozone1/vola1/bucka1/.snapshot/snap2
      23/07/19 07:27:23 INFO tools.SimpleCopyListing: Building listing using multi threaded approach for ofs://ozone1/vola1/bucka1/.snapshot/snap2: duration 0:00.019s
      23/07/19 07:27:23 INFO tools.SimpleCopyListing: Starting: Building listing using multi threaded approach for ofs://ozone1/vola1/bucka1/.snapshot/snap2
      23/07/19 07:27:23 INFO tools.SimpleCopyListing: Building listing using multi threaded approach for ofs://ozone1/vola1/bucka1/.snapshot/snap2: duration 0:00.012s
      23/07/19 07:27:23 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
      23/07/19 07:27:23 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
      23/07/19 07:27:23 ERROR tools.DistCp: Duplicate files in input path:
      org.apache.hadoop.tools.CopyListing$DuplicateFileException: File ofs://ozone1/vola1/bucka1/.snapshot/snap2/.Trash/systest and ofs://ozone1/vola1/bucka1/.snapshot/snap2/.Trash/systest would cause duplicates. Aborting
          at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:175)
          at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:93)
          at org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:397)
          at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:89)
          at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:216)
          at org.apache.hadoop.tools.DistCp.execute(DistCp.java:193)
          at org.apache.hadoop.tools.DistCp.run(DistCp.java:155)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
          at org.apache.hadoop.tools.DistCp.main(DistCp.java:445) 

      Attachments

        Issue Links

          Activity

            People

              sadanand_shenoy Sadanand Shenoy
              jyosin Jyotirmoy Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: