Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2117

Superfast Distcp when copying data within the same hdfs cluster

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • distcp
    • None

    Description

      There are use cases when distcp is used to copy a bunch of files/directories from one part of the HDFS namespace to another part within the same HDFS cluster. It is superfast if we can instruct relevant datanodes to make local replicas of relevant blocks and limit network usage to a minimum. It is especially useful to make HBase take a backup of a region with minimum downtime.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dhruba Dhruba Borthakur
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: