Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1292

dfs -copyToLocal should guarantee file is complete

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      We should copy to a temporary file, maybe _tmp.<realname>, and then rename the file when the copy is complete. Restarting a copy should reuse the _tmp file, just checksumming it. Then ^Cing a copy will do the right thing.

      Original suggestion:

      On Apr 23, 2007, at 2:38 AM, Richard Kasperski wrote:

      I'd like to have a guarantee that a file copy is both completed and that the file is whole. In the past I've done this by copying the file to a temporary name tmp.<realname> and then moving it to <realname> once I have the file copy is complete. This has the following very nice properties; If the <realname> exists then the file copy is complete and I'm not looking at a partial copy of the file. I believe that the copy to the cluster has both of these properties in that the file doesn't appear in a DFS directory until the whole file has been copied. The copy from the cluster to a local file system does not have these guarantees and it would be very nice if it did. There are two scenarios under what I wish to use this. First is that if I ctrl-c the 'hadoop dfs -copyToLocal' I know what parts are complete and what parts aren't. Second I can run a background compressor to compress the files as they are copied.

        Attachments

        1. HADOOP-1292_20070621c.patch
          11 kB
          Tsz-wo Sze

          Activity

            People

            • Assignee:
              szetszwo Tsz-wo Sze
              Reporter:
              eric14 Eric Baldeschwieler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: