Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13704

Don't call DistCp.execute() instead of DistCp.run()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.3.0, 2.0.0
    • 2.1.1, 2.2.0
    • Hive
    • None

    Description

      HIVE-11607 switched DistCp from using run to execute. The run method runs added logic that drives the state of SimpleCopyListing which runs in the driver, and of CopyCommitter which runs in the job runtime.

      When Hive ends up running DistCp for copy work (Between non matching FS or between encrypted/non-encrypted zones, for sizes above a configured value) this state not being set causes wrong paths to appear on the target (subdirs named after the file, instead of just the file).

      Hive should call DistCp's Tool run method and not the execute method directly, to not skip the target exists flag that the setTargetPathExists call would set:

      https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126

      Attachments

        1. HIVE-13704.1.patch
          1 kB
          Sergio Peña

        Issue Links

          Activity

            People

              spena Sergio Peña
              qwertymaniac Harsh J
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: