Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10459

distcp V2 doesn't preserve root dir's attributes when -p is specified

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.5.0
    • Component/s: tools/distcp
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Two issues were observed with distcpV2

      ISSUE 1. when copying a source dir to target dir with "-pu" option using command

      "distcp -pu source-dir target-dir"

      The source dir's owner is not preserved at target dir. Simiarly other attributes of source dir are not preserved. Supposedly they should be preserved when no -update and no -overwrite specified.

      There are two scenarios with the above command:

      a. when target-dir already exists. Issuing the above command will result in target-dir/source-dir (source-dir here refers to the last component of the source-dir path in the command line) at target file system, with all contents in source-dir copied to under target-dir/src-dir. The issue in this case is, the attributes of src-dir is not preserved.

      b. when target-dir doesn't exist. It will result in target-dir with all contents of source-dir copied to under target-dir. This issue in this case is, the attributes of source-dir is not carried over to target-dir.

      For multiple source cases, e.g., command

      "distcp -pu source-dir1 source-dir2 target-dir"

      No matter whether the target-dir exists or not, the multiple sources are copied to under the target dir (target-dir is created if it didn't exist). And their attributes are preserved.

      ISSUE 2. with the following command:

      "distcp source-dir target-dir"

      when source-dir is an empty directory, and when target-dir doesn't exist, source-dir is not copied, actually the command behaves like a no-op. However, when the source-dir is not empty, it would be copied and results in target-dir at the target file system containing a copy of source-dir's children.

      To be consistent, empty source dir should be copied too. Basically the above distcp command should cause target-dir get created at target file system, and the source-dir's attributes are preserved at target-dir when -p is passed.

        Attachments

        1. HDFS-6152.003.patch
          46 kB
          Yongjun Zhang
        2. HDFS-6152.002.patch
          46 kB
          Yongjun Zhang
        3. HDFS-6152.002.patch
          46 kB
          Yongjun Zhang
        4. HDFS-6152.001.patch
          45 kB
          Yongjun Zhang

          Issue Links

            Activity

              People

              • Assignee:
                yzhangal Yongjun Zhang
                Reporter:
                yzhangal Yongjun Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: