Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10295

Allow distcp to automatically identify the checksum type of source files and use it for the target

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.4.0
    • Component/s: tools/distcp
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Add option for distcp to preserve the checksum type of the source files. Users can use "-pc" as distcp command option to preserve the checksum type.

      Description

      Currently while doing distcp, users can use "-Ddfs.checksum.type" to specify the checksum type in the target FS. This works fine if all the source files are using the same checksum type. If files in the source cluster have mixed types of checksum, users have to either use "-skipcrccheck" or have checksum mismatching exception. Thus we may need to consider adding a new option to distcp so that it can automatically identify the original checksum type of each source file and use the same checksum type in the target FS.

        Attachments

        1. HADOOP-10295.000.patch
          18 kB
          Jing Zhao
        2. hadoop-10295.patch
          18 kB
          Laurent Goujon
        3. HADOOP-10295.002.patch
          34 kB
          Jing Zhao

          Activity

            People

            • Assignee:
              jingzhao Jing Zhao
              Reporter:
              jingzhao Jing Zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: