Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15273

distcp can't handle remote stores with different checksum algorithms

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0, 3.0.3
    • Component/s: tools/distcp
    • Labels:
      None
    • Target Version/s:

      Description

      When using distcp without -skipcrcchecks . If there's a checksum mismatch between src and dest store types (e.g hdfs to s3), then the error message will talk about blocksize, even when its the underlying checksum protocol itself which is the cause for failure

      Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.)

      update: the CRC check takes always place on a distcp upload before the file is renamed into place. and you can't disable it then

        Attachments

        1. HADOOP-15273-003.patch
          5 kB
          Steve Loughran
        2. HADOOP-15273-002.patch
          5 kB
          Steve Loughran
        3. HADOOP-15273-001.patch
          3 kB
          Steve Loughran

          Issue Links

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: