Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with checksums.

      Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to s3a.

      Rather than revert I want to make it an option, off by default. Once we are happy with distcp in future, we can turn it on.

      Why an option? Because it lines up for a successor to distcp which saves src and dest checksums to a file and can then verify whether or not files have really changed. Currently distcp relies on dest checksum algorithm being the same as the src for incremental updates, but if either of the stores don't serve checksums, silently downgrades to not checking.

        Attachments

        1. HADOOP-15297-001.patchh
          12 kB
          Steve Loughran
        2. HADOOP-15297-002.patch
          12 kB
          Steve Loughran
        3. HADOOP-15297-002.patch
          12 kB
          Steve Loughran
        4. HADOOP-15297-003.patch
          12 kB
          Steve Loughran

          Issue Links

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: