Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15281

Distcp to add no-rename copy option

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.3.0, 3.2.1, 3.1.3
    • tools/distcp
    • None

    Description

      Currently Distcp uploads a file by two strategies

      1. append parts
      2. copy to temp then rename

      option 2 executes the following sequence in promoteTmpToTarget

          if ((fs.exists(target) && !fs.delete(target, false))
              || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
              || !fs.rename(tmpTarget, target)) {
            throw new IOException("Failed to promote tmp-file:" + tmpTarget
                                    + " to: " + target);
          }
      

      For any object store, that's a lot of HTTP requests; for S3A you are looking at 12+ requests and an O(data) copy call.

      This is not a good upload strategy for any store which manifests its output atomically at the end of the write().

      Proposed: add a switch to write directly to the dest path, which can be supplied as either a conf option (distcp.direct.write = true) or a CLI option (-direct).

      Attachments

        1. HADOOP-15281-001.patch
          21 kB
          Andrew Olson
        2. HADOOP-15281-002.patch
          22 kB
          Andrew Olson
        3. HADOOP-15281-003.patch
          22 kB
          Andrew Olson
        4. HADOOP-15281-004.patch
          22 kB
          Andrew Olson
        5. HADOOP-15281-branch-2-001.patch
          26 kB
          Steve Loughran
        6. HADOOP-15281-branch-2-002.patch
          26 kB
          Andrew Olson

        Issue Links

          Activity

            People

              noslowerdna Andrew Olson
              stevel@apache.org Steve Loughran
              Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: