Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.5.1
-
None
-
None
Description
Current state and shortcoming
=======================
By default, distcp writes temporary files into $TARGET_PATH/.distcp.tmp/$taskatttempttid. (See RetriableFileCopyCommand#getTmpFile). There is no way a user can override this staging/tmp directory. The problem is obvious in S3 with versioning. For example, user wants to turn on S3 versioning only for his target directory but not the staging/tmp directory. Current distcp also creates versioning for staging directory which can contain a lot of temporary files. If user can override this path by a non-versioned S3 path for staging, it will make things cleaner.
Proposed solution
==============
Provide a new option(-stage) where user can optionally provide a path from target FS. Distcp mapper tasks will write distcp temporary files into that directory.
Possible Confusions
=================
There is another distcp option (-tmp) which can be assumed to serve the same purpose. But this option works only with "-atomic" option which has a different meaning of temporary files.
Another confusion could be the staging directory used by mapreduce framework. The proposed temp directory is for distcp specific.
Working on a patch to upload.