Description
Missing blocks (with replication factor 1) metric is not always decreased when file is deleted.
If a file is deleted, the remove function of UnderReplicatedBlocks can be called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called with the wrong priority the corruptReplOneBlocks metric is not decreased, however the block is removed from the priority queue which contains it.
The corresponding code:
/** remove a block from a under replication queue */ synchronized boolean remove(BlockInfo block, int oldReplicas, int oldReadOnlyReplicas, int decommissionedReplicas, int oldExpectedReplicas) { final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas, decommissionedReplicas, oldExpectedReplicas); boolean removedBlock = remove(block, priLevel); if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS && oldExpectedReplicas == 1 && removedBlock) { corruptReplOneBlocks--; assert corruptReplOneBlocks >= 0 : "Number of corrupt blocks with replication factor 1 " + "should be non-negative"; } return removedBlock; } /** * Remove a block from the under replication queues. * * The priLevel parameter is a hint of which queue to query * first: if negative or >= \{@link #LEVEL} this shortcutting * is not attmpted. * * If the block is not found in the nominated queue, an attempt is made to * remove it from all queues. * * <i>Warning:</i> This is not a synchronized method. * @param block block to remove * @param priLevel expected privilege level * @return true if the block was found and removed from one of the priority queues */ boolean remove(BlockInfo block, int priLevel) { if(priLevel >= 0 && priLevel < LEVEL && priorityQueues.get(priLevel).remove(block)) { NameNode.blockStateChangeLog.debug( "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" + " from priority queue {}", block, priLevel); return true; } else { // Try to remove the block from all queues if the block was // not found in the queue for the given priority level. for (int i = 0; i < LEVEL; i++) { if (i != priLevel && priorityQueues.get(i).remove(block)) { NameNode.blockStateChangeLog.debug( "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" + " {} from priority queue {}", block, i); return true; } } } return false; }
It is already fixed on trunk by this jira: HDFS-10999, but that ticket introduces new metrics, which I think should't be backported to branch-2.