Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
Issues 1: correctly mark corrupted blocks.
Issues 2: distinguish highest risk priority and normal risk priority.
UnderReplicatedBlocks.java
private int getPriority(int curReplicas, ... } else if (curReplicas == 1) { //only on replica -risk of loss // highest priority return QUEUE_HIGHEST_PRIORITY; ...
For stripe blocks, we should return QUEUE_HIGHEST_PRIORITY when curReplicas == 6( Suppose 6+3 schema).
That's important. Because
BlockManager.java
DatanodeDescriptor[] chooseSourceDatanodes(BlockInfo block, ... if(priority != UnderReplicatedBlocks.QUEUE_HIGHEST_PRIORITY && !node.isDecommissionInProgress() && node.getNumberOfBlocksToBeReplicated() >= maxReplicationStreams) { continue; // already reached replication limit } ...
It may return not enough source DNs ( maybe 5), and failed to recover.
A busy node should not be skiped if a block has highest risk/priority. The issue is the striped block doesn't have priority.