Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16440

Distcp can not preserve timestamp with -delete option

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.7, 3.1.2
    • Fix Version/s: 3.3.0, 3.2.1, 3.1.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Flags:
      Patch

      Description

      Use distcp with  -prbugpcaxt and -delete to copy data between cluster.

      hadoop distcp -Dmapreduce.job.queuename="QueueA" -prbugpcaxt -update -delete  hdfs://sourcecluster/user/hive/warehouse/sum.db hdfs://destcluster/user/hive/warehouse/sum.db

      After distcp, we found  the timestamp of dest is different from source, and the timestamp of some directory was the time distcp running.

      Check the code of distcp, in CopyCommitter, it preserves time first then process -delete option which will change the timestamp of dest directory. So we should process -delete option first. 

       

        Attachments

        1. HDFS-14621.004.patch
          6 kB
          ludun
        2. HDFS-14621.003.patch
          7 kB
          ludun
        3. HDFS-14621.002.patch
          7 kB
          ludun
        4. HDFS-14261.001.patch
          5 kB
          ludun

          Activity

            People

            • Assignee:
              pilchard ludun
              Reporter:
              pilchard ludun
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: