[HDFS-2537] re-replicating under replicated blocks should be more dynamic - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.20.205.0, 0.23.0, 2.0.0-alpha
Fix Version/s: None
Component/s: None
Labels:
None

Description

When a node fails or is decommissioned, a large number of blocks become under-replicated. Since re-replication work is distributed, the hope would be that all blocks could be restored to their desired replication factor in very short order. This doesn't happen though because the load the cluster is willing to devote to this activity is mostly static (controlled by configuration variables). Since it's mostly static, the rate has to be set conservatively to avoid overloading the cluster with replication work.

This problem is especially noticeable when you have lots of small blocks. It can take many hours to re-replicate the blocks that were on a node while the cluster is mostly idle.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nathan Roberts

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 03/Nov/11 19:43

Updated:: 10/Mar/15 04:36