[HDFS-1432] HDFS across data centers: HighTide - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

There are many instances when the same piece of data resides on multiple HDFS clusters in different data centers. The primary reason being that the physical limitation of one data center is insufficient to host the entire data set. In that case, the administrator(s) typically partition that data into two (or more) HDFS clusters on two different data centers and then duplicates some subset of that data into both the HDFS clusters.

In such a situation, there will be six physical copies of data that is duplicated, three copies in one data center and another three copies in another data center. It would be nice if we can keep fewer than 3 replicas on each of the data centers and have the ability to fix a replica in the local data center by copying data from the remote copy in the remote data center.

Attachments

Issue Links

is related to

HDFS-9075 Multiple datacenter replication inside one HDFS cluster

Open

relates to

HIVE-1813 Hive should be able to run on multiple data centers

Open

Sub-Tasks

Allow a datanode to copy a block to a datanode on a foreign HDFS cluster.

Open

Dhruba Borthakur

Activity

People

Assignee:: Dhruba Borthakur

Reporter:: Dhruba Borthakur

Votes:: 1 Vote for this issue

Watchers:: 57 Start watching this issue

Dates

Created:: 30/Sep/10 04:14

Updated:: 15/Sep/15 20:08