Affects Version/s: 2.5.1
Fix Version/s: None
Current state and shortcoming
By default, distcp writes temporary files into $TARGET_PATH/.distcp.tmp/$taskatttempttid. (See RetriableFileCopyCommand#getTmpFile). There is no way a user can override this staging/tmp directory. The problem is obvious in S3 with versioning. For example, user wants to turn on S3 versioning only for his target directory but not the staging/tmp directory. Current distcp also creates versioning for staging directory which can contain a lot of temporary files. If user can override this path by a non-versioned S3 path for staging, it will make things cleaner.
Provide a new option(-stage) where user can optionally provide a path from target FS. Distcp mapper tasks will write distcp temporary files into that directory.
There is another distcp option (-tmp) which can be assumed to serve the same purpose. But this option works only with "-atomic" option which has a different meaning of temporary files.
Another confusion could be the staging directory used by mapreduce framework. The proposed temp directory is for distcp specific.
Working on a patch to upload.