Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.10.1
-
None
-
None
Description
I see two types of replications that should be accelerated compared to all others.
1. Blocks that have only one remaining copy (but are required to have higher replication).
2. Blocks that have less than 1/3 of their replicas in place.
The latter occurs when map/reduce sets replication of certain files to 10, and we want
it happen fast to achieve better performance on the tasks.
So I think we should distinguish two major groups of under-replicated blocks:
first-priority (having only 1 copy or less than 1/3 of required replicas), and the rest.
The name-node places first-priority blocks into the beginning of the neededReplication
list, and the rest are placed at the end. That way the first-priority blocks will be replicated
first and then the others.