|
The main features in this patch.
Konstantin Shvachko made changes - 02/Apr/08 08:19 PM
Konstantin Shvachko made changes - 02/Apr/08 08:20 PM
Konstantin Shvachko made changes - 02/Apr/08 08:20 PM
Konstantin Shvachko made changes - 02/Apr/08 08:25 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379179/SecondaryStorage.patch against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 3 new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2127/testReport/ This message is automatically generated.
Konstantin Shvachko made changes - 09/Apr/08 01:39 AM
This is a new patch that
The latter was tricky. TestCheckpoint failed on Hudson but not on any of other machines I tested it. The failure is related to that when the name-node started it did not get an exclusive lock for its storage directory as required. I initially suspected that this is a Solaris problem, but later realized that it is a NFS problem, which may not support exclusive locks consistently.
Konstantin Shvachko made changes - 11/Apr/08 02:46 AM
Konstantin Shvachko made changes - 11/Apr/08 03:03 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379900/SecondaryStorage.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2202/testReport/ This message is automatically generated. Code looks good. Only one comment:
1. The patch makes rollEditLog() return a WritableComparable . It might be better to make it return an object of type CheckpointSignature. rollEditLog() is returning CheckpointSignature instead of WritableComparable. Had to factor out CheckpointSignature into a separate file.
Fixed unit test failure.
Konstantin Shvachko made changes - 11/Apr/08 10:31 AM
Konstantin Shvachko made changes - 11/Apr/08 10:32 AM
Konstantin Shvachko made changes - 11/Apr/08 10:32 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379913/SecondaryStorage.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2207/testReport/ This message is automatically generated. I just committed this.
Konstantin Shvachko made changes - 11/Apr/08 09:19 PM
Integrated in Hadoop-trunk #458 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/458/
Robert Chansler made changes - 22/Jul/08 05:28 PM
Nigel Daley made changes - 22/Aug/08 07:50 PM
Owen O'Malley made changes - 08/Jul/09 04:42 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
And we were able to reconstruct the namespace image from the secondary node using the following
manual procedure, which might be useful for those who find themselves in the same type of trouble.
Manual recovery procedure from the secondary image.
The old namespaceID could be obtained from one of the data-nodes
just copy it from <dfs.data.dir>/current/VERSION.namespaceID
Automatic recovery proposal.
The proposal consists has 2 parts.
name-node storage directory structure. It is best if secondary node uses Storage class
(or FSImage if code re-use makes sense here) in order to maintain the checkpoint directory.
This should provide that the checkpointed image is always ready to be read by a name-node
if the directory is listed in its "dfs.name.dir" list.
location of the image available for read-only access during startup.
This means that if name-node finds all directories listed in "dfs.name.dir" unavailable or
finds their images corrupted, then it should turn to the "fs.checkpoint.dir" directory
and try to fetch the image from there. I think this should not be the default behavior but
rather triggered by a name-node startup option, something like:
So the name-node can start with the secondary image as long as the secondary node drive is mounted.
And the name-node will never attempt to write anything to this drive.
Added bonuses provided by this approach
This brings us a step closer to the hot standby.
support multiple entries in "fs.checkpoint.dir". This is of course if the administrator
chooses to accept outdated images in order to boost the name-node performance.