Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15273

distcp can't handle remote stores with different checksum algorithms

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0, 3.0.3
    • Component/s: tools/distcp
    • Labels:
      None
    • Target Version/s:

      Description

      When using distcp without -skipcrcchecks . If there's a checksum mismatch between src and dest store types (e.g hdfs to s3), then the error message will talk about blocksize, even when its the underlying checksum protocol itself which is the cause for failure

      Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.)

      update: the CRC check takes always place on a distcp upload before the file is renamed into place. and you can't disable it then

        Attachments

        1. HADOOP-15273-003.patch
          5 kB
          Steve Loughran
        2. HADOOP-15273-002.patch
          5 kB
          Steve Loughran
        3. HADOOP-15273-001.patch
          3 kB
          Steve Loughran

        Issue Links

          Activity

            People

            • Assignee:
              sodonnell Stephen O'Donnell
              Reporter:
              stevel@apache.org Steve Loughran

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment