Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13002

distcp behaves differently through code compared to toolrunner invocation from command-line

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.5.0, 2.6.0, 2.7.0, 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: tools/distcp
    • Labels:
      None

      Description

      In Hadoop 2.5 the behavior of distcp changed when called through code iff the target directory did not exist and update wasn't used and atomic wasn't used.
      HADOOP-10459 introduced a change to preserve the root directory attributes. It introduced a derivative property in the options as well as in the configuration whether the target path exists. See https://github.com/apache/hadoop/commit/c5b59477775c797944db4992e8a70289ba2895ed
      However, this property is set only when distcp is used through the command line as a ToolRunner in Distcp.run(String[] argv).
      The result is that when the target directory doesn't exist (and neither -update nor -atomic options are used) SimplyCopyListing incorrectly assumes that the target directory does exist because the attribute defaults to true. Copying directory a/b/c to xyz results in the creation of a xyx/c directory with the content of c in it, rather than the content of c getting copied into directory xyz directly.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                gsteelman Gary Steelman
                Reporter:
                jrottinghuis Joep Rottinghuis
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: