Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14086

Improve DistCp Speed for small files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.6.5
    • None
    • tools/distcp
    • None

    Description

      When using distcp to copy lots of small files, NameNode naturally becomes a bottleneck.

      The current distcp code did not optimize to reduce the NameNode calls. We should restructure the code to reduce the number of NameNode calls as much as possible to speed up the copy of small files.

      Attachments

        Issue Links

          Activity

            People

              zshao Zheng Shao
              zshao Zheng Shao
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: