Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8940

Add a resume feature to the copyFromLocal and put commands

    Details

    • Type: New Feature
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0.1-alpha
    • Fix Version/s: None
    • Component/s: tools
    • Labels:
      None

      Description

      Add a resume feature to the copyFromLocal command. Failures in large transfers result in a great deal of wasted time. For large files, it would be good to be able to continue from the last good block onwards. The file would have to be unavailable to other clients for reads or regular writes until the "resume" process was completed.

        Activity

        Hide
        revans2 Robert Joseph Evans added a comment -

        It almost sounds like you want to turn this into something like rsync. I think it would be much more useful to just add in an rsync command with a simmilar set of features and flags then trying to reinvent it piecemeal. Then it can look at time stamps on the files, and possibly checksums as well, to pick up where it left off on a failure.

        Show
        revans2 Robert Joseph Evans added a comment - It almost sounds like you want to turn this into something like rsync. I think it would be much more useful to just add in an rsync command with a simmilar set of features and flags then trying to reinvent it piecemeal. Then it can look at time stamps on the files, and possibly checksums as well, to pick up where it left off on a failure.
        Hide
        eli Eli Collins added a comment -

        Yea, something like rsync or Sqoop for file systems seems more appropriate.

        Show
        eli Eli Collins added a comment - Yea, something like rsync or Sqoop for file systems seems more appropriate.
        Hide
        adam.muise Adam Muise added a comment -

        Yes, this should probably look like rsync.

        No, sqoop does not support this use case.

        Show
        adam.muise Adam Muise added a comment - Yes, this should probably look like rsync. No, sqoop does not support this use case.
        Hide
        qwertymaniac Harsh J added a comment -

        Then it can look at time stamps on the files, and possibly checksums as well, to pick up where it left off on a failure.

        You could also do this with DistCp's -update flag, with -Dmapreduce.framework.name=local passed through for Local FS file:/// sources. I'm uncertain if the checksum checks would work though, unless the files were written by the Checksumming FS. Useful for a lot of files, but probably not if what's needed is independent file-level append-like resume.

        Show
        qwertymaniac Harsh J added a comment - Then it can look at time stamps on the files, and possibly checksums as well, to pick up where it left off on a failure. You could also do this with DistCp's -update flag, with -Dmapreduce.framework.name=local passed through for Local FS file:/// sources. I'm uncertain if the checksum checks would work though, unless the files were written by the Checksumming FS. Useful for a lot of files, but probably not if what's needed is independent file-level append-like resume.

          People

          • Assignee:
            mahesh.ksl Mahesh Dharmasena
            Reporter:
            adam.muise Adam Muise
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development