Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16083

DistCp shouldn't always overwrite the target file when checksums match

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.2.0, 3.1.1, 3.3.0
    • None
    • tools/distcp
    • None

    Description

      CopyMapper#setup
      ...
          try {
            overWrite = overWrite || targetFS.getFileStatus(targetFinalPath).isFile();
          } catch (FileNotFoundException ignored) {
          }
      ...
      

      The above code overrides config key "overWrite" to "true" when the target path is a file. Therefore, unnecessary transfer happens when the source and target file have the same checksums.

      My suggestion is: remove the code above. If the user insists to overwrite, just add -overwrite in the options:

      DistCp command with -overwrite option
      hadoop distcp -overwrite hdfs://localhost:64464/source/5/6.txt hdfs://localhost:64464/target/5/6.txt
      

      Attachments

        1. HADOOP-16083.001.patch
          0.8 kB
          Siyao Meng

        Activity

          People

            smeng Siyao Meng
            smeng Siyao Meng
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: