Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6713

Distcp doesn't provide any option to override the default staging directory

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.5.1
    • Fix Version/s: None
    • Component/s: distcp
    • Labels:
      None

      Description

      Current state and shortcoming
      =======================
      By default, distcp writes temporary files into $TARGET_PATH/.distcp.tmp/$taskatttempttid. (See RetriableFileCopyCommand#getTmpFile). There is no way a user can override this staging/tmp directory. The problem is obvious in S3 with versioning. For example, user wants to turn on S3 versioning only for his target directory but not the staging/tmp directory. Current distcp also creates versioning for staging directory which can contain a lot of temporary files. If user can override this path by a non-versioned S3 path for staging, it will make things cleaner.

      Proposed solution
      ==============
      Provide a new option(-stage) where user can optionally provide a path from target FS. Distcp mapper tasks will write distcp temporary files into that directory.

      Possible Confusions
      =================
      There is another distcp option (-tmp) which can be assumed to serve the same purpose. But this option works only with "-atomic" option which has a different meaning of temporary files.
      Another confusion could be the staging directory used by mapreduce framework. The proposed temp directory is for distcp specific.

      Working on a patch to upload.

        Attachments

          Activity

            People

            • Assignee:
              kamrul Mohammad Kamrul Islam
              Reporter:
              kamrul Mohammad Kamrul Islam
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: