Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13704

Don't call DistCp.execute() instead of DistCp.run()

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.3.0, 2.0.0
    • Fix Version/s: 2.1.1, 2.2.0
    • Component/s: Hive
    • Labels:
      None

      Description

      HIVE-11607 switched DistCp from using run to execute. The run method runs added logic that drives the state of SimpleCopyListing which runs in the driver, and of CopyCommitter which runs in the job runtime.

      When Hive ends up running DistCp for copy work (Between non matching FS or between encrypted/non-encrypted zones, for sizes above a configured value) this state not being set causes wrong paths to appear on the target (subdirs named after the file, instead of just the file).

      Hive should call DistCp's Tool run method and not the execute method directly, to not skip the target exists flag that the setTargetPathExists call would set:

      https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                spena Sergio Peña
                Reporter:
                qwertymaniac Harsh J
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: