Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18596

Distcp -update between different cloud stores to use modification time while checking for file skip.

    XMLWordPrintableJSON

Details

    Description

      Distcp -update currently relies on File size, block size, and Checksum comparisons to figure out which files should be skipped or copied.
      Since different cloud stores have different checksum algorithms we should check for modification time as well to the checks.

      This would ensure that while performing -update if the files are perceived to be out of sync we should copy them. The machines between which the file transfers occur should be in time sync to avoid any extra copies.

      Improving testing and documentation for modification time checks between different object stores to ensure no incorrect skipping of files.

      Attachments

        Issue Links

          Activity

            People

              mehakmeetSingh Mehakmeet Singh
              mehakmeetSingh Mehakmeet Singh
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: