[HDFS-13992] cross-cluster rack awareness for distcp - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.8.4, 3.1.1, 3.0.3, 2.7.7
Fix Version/s: None
Component/s: None
Labels:
- distcp
- rack-awareness

Description

Would be great if distcp supported cross-cluster rack awareness.

For example, we have hdfs cluster1 and hdfs cluster2.
Both clusters span three switches, and both have rack awareness enabled.
And also both clusters name same switches same way.

So when distcp runs data replication job, it could replicate hdfs blocks
only to counterpart datanodes on destination cluster that are in the same physical network
switch, minimizing latencies and maximizing bandwidth.

It could be an option, activate through `distcp` clommand-line switch.
We have multiple clusters with default replication of 3 and all those cluster live in same three different "racks" / "top of the rack switches".

This could drastically minimize inter-switch network traffic during huge distcp jobs.

Attachments

Issue Links

relates to

HADOOP-13031 Rack-aware read bytes stats should be managed by HFDS specific StorageStatistics

Open

HADOOP-13032 Refactor FileSystem$Statistics to use StorageStatistics

Open

HADOOP-15125 Complete integration of new StorageStatistics

Open

HDFS-9579 Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Ruslan Dautkhanov

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 15/Oct/18 00:28

Updated:: 15/Oct/18 00:31