Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13975

Allow DistCp to use MultiThreadedMapper

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: tools/distcp
    • Labels:
      None

      Description

      Although distcp allow users to control the parallelism via number of mappers, sometimes it's desirable to run fewer mappers but more threads per mapper. Since distcp is network bound (either by throughput or more frequently by latency of creating connections, opening files, reading/writing files, and closing files), this can make each mapper much more efficient. When WebHDFS protocol is used either as source or target, this MultiThreaded approach can make the HTTP connection reuse (to the NameNode) more efficient as well.

      In that way, a lot of resources can be shared so we can save memory and connections to NameNode.

        Attachments

          Activity

            People

            • Assignee:
              zshao Zheng Shao
              Reporter:
              zshao Zheng Shao
            • Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated: