|
That was a premeture comment. Actually Talked to Raghu regarding this.
bq For (2), it'll be nice if the namenode can delete the corrupted block if there's a good replica on other nodes. Right now, if there are good replicas, then namenode does replicate the good blocks after it times out while trying to replicate corrupt block. This was fixed by bq For (3), I prefer if the namenode can still replicate the block. With the current policy, if all blocks are corrupted, namenode would delete 2 of them and since it fails to replicate, it keep on trying as mentioned in Now, do we want that single replica to be replicated? In that case it is similar to namenode not looping while replicating. Had a discussion with Dhruba and Raghu regarding the organization of blocks and how best the situation described in this JIRA could be addressed.
To summarize: A block could have all of its replicas corrupted at any point of time in HDFS. By default, HDFS detects this and deletes corrupted blocks until the last copy. The last copy is never deleted. The request of this JIRA is to allow replication of this corrupted single replica so that we have more than one copy of this block even if it is corrupt. Another way to look at the problem is that when all replicas are corrupt, lets not delete any of them. Initially, we thought we should somehow mark a block as corrupt if we identify its replicas are bad, but without checking (by reading or datanode reporting as bad block), we do not know if all replicas of the block are corrupt. Consider the case when HDFS is up and running and one of the datanode reports that the replica it holds is corrupt. The current behavior is that we delete this block and request for replication. One proposed idea is as follows. When we get a notification from Datanode that the block is corrupt, we mark the block associated with this datanode to be corrupt and store it in DatanodeDescriptor as a list possibly. We also request a replication of this block but do not deleted this replica yet. At this point, if we have another good replica, it would get replicated and eventually we would get the additional addBlock request. Now, we check if this is an additonal copy and if this block has corrupt replica, if so, we add this new block and delete the corrupt replica. If a block has all of its copies as corrupt, then in some time, we do come to know about this and we havnt requested them to be deleted any of them. We should thinking about how to filter this copy (similar to decommission flag) and how best such blocks could be reported. Thoughts? When a datanode reports block as corrupt, instead of deleting we mark the (datanode-block) to be corrupt and request replication of this block
I think this would take care of retaining all corrupt copies, but one case when I see a problem is pendingReplication thread which would keep on looping to replicate corrupt blocks. We could have a check here to see if number of pending replicas for block is equal to the number of corrupt copies and remove from pendingReplication thread.
Inside FSNameSystem.processPendingReplications() we fetch the timedOutItems, a list of blocks. We could check if all copies of such blocks are corrupt, if so, just log it and do not added it to neededReplication queue. Anything else I should consider? Thoughts? +1. I like this approach.
This initial patch addresses all the above points.
After talking to dhruba, here are some more things which needs to be taken care of to complete this
One way to implement this is
This is being done to get ride of the corruptList in DatanodeDescriptor. Anything we might have missed? another approach suggested by Konstantin is to have a global map of corruptBlocks. This has 2 advantages
Extending BlockInfo and replacing it back seems complicated. If we move the list to be a globalList, all we have to handle is a new DataStructure returned via getBlockLocations. We could have something like LocatedReplicas instead of DatanodeInfo inside LocatedBlock. I think this solves the whole problem, Thoughts? This patch is based on approach discussed above.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380906/HADOOP-2065-2.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 4 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2325/testReport/ This message is automatically generated. Attaching the patch against trunk.
Thanks Konstantin, this new patch adds all the changes you suggested.
Attached is the patch against trunk.
Had missed adding comments.
When adding entrys to corruptReplicasMap, it might be good to get a BlockInfo object and use it as a key. This idea can be found in
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381635/HADOOP-2065-6.patch against trunk revision 654315. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2425/testReport/ This message is automatically generated. This looks correct.
I have a concern that we probably need to move this into a separate data-structure. So that not to refactor it later as we did with other block collections. So,
Thanks Konstantine. Attached patch adds a separate class CorruptReplicasMap as you suggested
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381849/HADOOP-2065-7.patch against trunk revision 655337. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2448/testReport/ This message is automatically generated. I just committed this. Thanks Lohit.
Integrated in Hadoop-trunk #489 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/489/
This seems related to
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HADOOP-2012. When it detects a corrupt block, it just asks the Namenode to delete it (same interface is used by client when it detects a bad block). In this case, namenode deletes the block as long as there are more replicas. So it does not really make sure that there is at least one good replica.