Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15887

Add an option to avoid writing data locally in Distcp

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.8.2, 3.0.0
    • Fix Version/s: None
    • Component/s: tools/distcp
    • Labels:
      None
    • Target Version/s:

      Description

      When copying large amount of data from one cluster to another via Distcp, and the Distcp jobs run in the target cluster, the datanode local usage would be imbalanced. Because the default placement policy chooses the local node to store the first replication.

      In https://issues.apache.org/jira/browse/HDFS-3702 we add a flag in DFSClient to avoid replicating to the local datanode.  We can make use of this flag in Distcp.

        Attachments

        1. HADOOP-15887.001.patch
          11 kB
          Tao Jie
        2. HADOOP-15887.002.patch
          18 kB
          Tao Jie

          Activity

            People

            • Assignee:
              Tao Jie Tao Jie
              Reporter:
              Tao Jie Tao Jie
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: