Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15273

distcp can't handle remote stores with different checksum algorithms

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0, 3.0.3
    • tools/distcp
    • None

    Description

      When using distcp without -skipcrcchecks . If there's a checksum mismatch between src and dest store types (e.g hdfs to s3), then the error message will talk about blocksize, even when its the underlying checksum protocol itself which is the cause for failure

      Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.)

      update: the CRC check takes always place on a distcp upload before the file is renamed into place. and you can't disable it then

      Attachments

        1. HADOOP-15273-003.patch
          5 kB
          Steve Loughran
        2. HADOOP-15273-002.patch
          5 kB
          Steve Loughran
        3. HADOOP-15273-001.patch
          3 kB
          Steve Loughran

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: