Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15300

distcp -update to WASB and ADL copies up all the files, always

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: fs/adl, fs/azure
    • Labels:
      None

      Description

      If you use distcp -update to an adl or wasb store, repeatedly, all the source files are copied up every time. In contrast, if you use hdfs:// or s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums for a diff, but s3a is just returning file length and relying on distcp logic being "if either src or dest doesn't do checksums, only compare file len"

      somehow that's not kicking in. Tested for file: and hdfs sources, wasb and adl dests

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: