|
This fixes the problem.
I tried to make a unit test, but it is really hard to make it deterministic.
Konstantin Shvachko made changes - 24/Dec/08 12:05 AM
This is the patch for 0.18 branch.
I tested it manually. I reproduced the error with current code and verified this is not happening with the patch.
Konstantin Shvachko made changes - 24/Dec/08 12:21 AM
Konstantin Shvachko made changes - 24/Dec/08 12:22 AM
ant test-patch results:
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
All tests except TestMapReduceLocal passed.
I just committed this.
Konstantin Shvachko made changes - 31/Dec/08 01:17 AM
Nigel Daley made changes - 30/Jan/09 08:14 PM
Owen O'Malley made changes - 08/Jul/09 04:43 PM
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This happens when data-nodes are flaky or busy and name-node looses them and replicates blocks to other nodes, but the old data-nodes come back and report extra (old) replicas for blocks that have already been re-replicated.
Thus the block becomes over-replicated and therefore is placed into excessReplicateMap.
The data-nodes delete excessive replicas, but before the deletion is reported back to the name-node the latter calls processMisReplicatedBlocks() which clears excessReplicateMap in an attempt to restore it from scratch.
The error is that after clearing excessReplicateMap all replicas become valid from the name-node point of view although some of them have already been deleted by data-nodes. So if there were 6 replicas of the same block and before processMisReplicatedBlocks() first three were deleted, then after processMisReplicatedBlocks() clears excessReplicateMap the other three can be also removed and there will be no more replicas of the block.