Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15887

Add an option to avoid writing data locally in Distcp

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.8.2, 3.0.0
    • None
    • tools/distcp
    • None

    Description

      When copying large amount of data from one cluster to another via Distcp, and the Distcp jobs run in the target cluster, the datanode local usage would be imbalanced. Because the default placement policy chooses the local node to store the first replication.

      In https://issues.apache.org/jira/browse/HDFS-3702 we add a flag in DFSClient to avoid replicating to the local datanode.  We can make use of this flag in Distcp.

      Attachments

        1. HADOOP-15887.005.patch
          18 kB
          Tao Jie
        2. HADOOP-15887.004.patch
          17 kB
          Tao Jie
        3. HADOOP-15887.003.patch
          17 kB
          Tao Jie
        4. HADOOP-15887.002.patch
          18 kB
          Tao Jie
        5. HADOOP-15887.001.patch
          11 kB
          Tao Jie

        Issue Links

          Activity

            People

              Tao Jie Tao Jie
              Tao Jie Tao Jie
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: