Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18739

Parallelize concatenation of distcp chunks of separate files in CopyCommitter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.6
    • None
    • tools/distcp

    Description

      While copying a folder containing large files consisting of multiple distcp chunks, copy committer synchronously picks chunks of each file and concatenates them. This part can be improved by parallelizing the concatenation of distcp chunks of separate files. We are able to save 2-3 minutes while copying a folder of 100 GB containing 20 files of 5GB size with this improvement.

      Contributing a patch for this.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              abhay.yadav Abhay Yadav
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: