Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16145

CopyListing fails with FNF exception with snapshot diff

    XMLWordPrintableJSON

    Details

      Description

      Distcp with snapshotdiff and with filters, marks a Rename as a delete opeartion on the target if the rename target is to a directory which is exluded by the filter. But, in cases, where files/subdirs created/modified prior to the Rename post the old snapshot will still be present as modified/created entries in the final copy list. Since, the parent diretory is marked for deletion, these subsequent create/modify entries should be ignored while building the final copy list. 

      With such cases, when the final copy list is built, distcp tries to do a lookup for each create/modified file in the newer snapshot which will fail as, the parent dir is already moved to a new location in later snapshot.

       

      sudo -u kms hadoop key create testkey
      hadoop fs -mkdir -p /data/gcgdlknnasg/
      hdfs crypto -createZone -keyName testkey -path /data/gcgdlknnasg/
      hadoop fs -mkdir -p /dest/gcgdlknnasg
      hdfs crypto -createZone -keyName testkey -path /dest/gcgdlknnasg
      hdfs dfs -mkdir /data/gcgdlknnasg/dir1
      hdfs dfsadmin -allowSnapshot /data/gcgdlknnasg/ 
      hdfs dfsadmin -allowSnapshot /dest/gcgdlknnasg/ 
      
      [root@nightly62x-1 logs]# hdfs dfs -ls -R /data/gcgdlknnasg/
      drwxrwxrwt   - hdfs supergroup          0 2021-07-16 14:05 /data/gcgdlknnasg/.Trash
      drwxr-xr-x   - hdfs supergroup          0 2021-07-16 13:07 /data/gcgdlknnasg/dir1
      [root@nightly62x-1 logs]# hdfs dfs -ls -R /dest/gcgdlknnasg/
      [root@nightly62x-1 logs]#
      
      hdfs dfs -put /etc/hosts /data/gcgdlknnasg/dir1/
      hdfs dfs -rm -r /data/gcgdlknnasg/dir1/
      hdfs dfs -mkdir /data/gcgdlknnasg/dir1/
      
      ===> Run BDR with “Abort on Snapshot Diff Failures” CHECKED now in the replication schedule. You get into below error and failure of the BDR job.
      
      21/07/16 15:02:30 INFO distcp.DistCp: Failed to use snapshot diff - 
      java.io.FileNotFoundException: File does not exist: /data/gcgdlknnasg/.snapshot/distcp-5-46485360-new/dir1/hosts
      	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1494)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1487)
      ……..
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                shashikant Shashikant Banerjee
                Reporter:
                shashikant Shashikant Banerjee
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m