Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13992

cross-cluster rack awareness for distcp

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.8.4, 3.1.1, 3.0.3, 2.7.7
    • Fix Version/s: None
    • Component/s: None

      Description

      Would be great if distcp supported cross-cluster rack awareness.

      For example, we have hdfs cluster1 and hdfs cluster2.
      Both clusters span three switches, and both have rack awareness enabled.
      And also both clusters name same switches same way.

      So when distcp runs data replication job, it could replicate hdfs blocks 
      only to counterpart datanodes on destination cluster that are in the same physical network 
      switch, minimizing latencies and maximizing bandwidth. 

      It could be an option, activate through `distcp` clommand-line switch.
      We have multiple clusters with default replication of 3 and all those cluster live in same three different "racks" / "top of the rack switches".

      This could drastically minimize inter-switch network traffic during huge distcp jobs.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Tagar Ruslan Dautkhanov
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: