Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3049

During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 3.0.0, 2.0.3-alpha
    • Component/s: namenode
    • Labels:
      None

      Description

      During the NameNode startup process, we load an image, and then apply edit logs to it until we believe that we have all the latest changes. Unfortunately, if there is an I/O error while reading any of these files, in most cases, we simply abort the startup process. We should try harder to locate a readable edit log and/or image file.

      There are three main use cases for this feature:
      1. If the operating system does not honor fsync (usually due to a misconfiguration), a file may end up in an inconsistent state.
      2. In certain older releases where we did not use fallocate() or similar to pre-reserve blocks, a disk full condition may cause a truncated log in one edit directory.
      3. There may be a bug in HDFS which results in some of the data directories receiving corrupt data, but not all. This is the least likely use case.

      Proposed changes to normal NN startup

      • We should try a different FSImage if we can't load the first one we try.
      • We should examine other FSEditLogs if we can't load the first one(s) we try.
      • We should fail if we can't find EditLogs that would bring us up to what we believe is the latest transaction ID.

      Proposed changes to recovery mode NN startup:
      we should list out all the available storage directories and allow the operator to select which one he wants to use.
      Something like this:

      Multiple storage directories found.
      1. /foo/bar
          edits__curent__XYZ          size:213421345       md5:2345345
          image                                  size:213421345       md5:2345345
      2. /foo/baz
          edits__curent__XYZ          size:213421345       md5:2345345345
          image                                  size:213421345       md5:2345345
      Which one would you like to use? (1/2)
      

      As usual in recovery mode, we want to be flexible about error handling. In this case, this means that we should NOT fail if we can't find EditLogs that would bring us up to what we believe is the latest transaction ID.

      Not addressed by this feature
      This feature will not address the case where an attempt to access the NameNode name directory or directories hangs because of an I/O error. This may happen, for example, when trying to load an image from a hard-mounted NFS directory, when the NFS server has gone away. Just as now, the operator will have to notice this problem and take steps to correct it.

      1. hdfs-3049-branch-2.txt
        28 kB
        Todd Lipcon
      2. HDFS-3049.028.patch
        40 kB
        Colin Patrick McCabe
      3. HDFS-3049.028.patch
        40 kB
        Colin Patrick McCabe
      4. HDFS-3049.028.patch
        40 kB
        Colin Patrick McCabe
      5. HDFS-3049.027.patch
        41 kB
        Colin Patrick McCabe
      6. HDFS-3049.026.patch
        39 kB
        Colin Patrick McCabe
      7. HDFS-3049.025.patch
        39 kB
        Colin Patrick McCabe
      8. HDFS-3049.023.patch
        37 kB
        Colin Patrick McCabe
      9. HDFS-3049.021.patch
        87 kB
        Colin Patrick McCabe
      10. HDFS-3049.018.patch
        118 kB
        Colin Patrick McCabe
      11. HDFS-3049.017.patch
        117 kB
        Colin Patrick McCabe
      12. HDFS-3049.015.patch
        114 kB
        Colin Patrick McCabe
      13. HDFS-3049.013.patch
        88 kB
        Colin Patrick McCabe
      14. HDFS-3049.012.patch
        88 kB
        Colin Patrick McCabe
      15. HDFS-3049.011.patch
        79 kB
        Colin Patrick McCabe
      16. HDFS-3049.010.patch
        79 kB
        Colin Patrick McCabe
      17. HDFS-3049.007.against3335.patch
        72 kB
        Colin Patrick McCabe
      18. HDFS-3049.006.against3335.patch
        69 kB
        Colin Patrick McCabe
      19. HDFS-3049.005.against3335.patch
        53 kB
        Colin Patrick McCabe
      20. HDFS-3049.003.patch
        23 kB
        Colin Patrick McCabe
      21. HDFS-3049.002.patch
        23 kB
        Colin Patrick McCabe
      22. HDFS-3049.001.patch
        22 kB
        Colin Patrick McCabe

        Issue Links

          Activity

          Hide
          Hari Mankude added a comment -

          Additionally, I think one more feature needed to be added to the proposed changes.

          It would be good to provide an ability for the user to skip over a edit log entry at a particular txid. The reasons for this would be that this particular entry is corrupted/malformed and edit log recovery freezes at this location or even worse causes NN to crash. A seperate jira can be opened for this feature if it does not match up with the above changes.

          Show
          Hari Mankude added a comment - Additionally, I think one more feature needed to be added to the proposed changes. It would be good to provide an ability for the user to skip over a edit log entry at a particular txid. The reasons for this would be that this particular entry is corrupted/malformed and edit log recovery freezes at this location or even worse causes NN to crash. A seperate jira can be opened for this feature if it does not match up with the above changes.
          Hide
          Colin Patrick McCabe added a comment -

          Hi Hari,

          What you're requesting sounds like Recovery mode. There is a separate JIRA for recovery mode-- check it out at https://issues.apache.org/jira/browse/HDFS-3004

          C.

          Show
          Colin Patrick McCabe added a comment - Hi Hari, What you're requesting sounds like Recovery mode. There is a separate JIRA for recovery mode-- check it out at https://issues.apache.org/jira/browse/HDFS-3004 C.
          Hide
          Todd Lipcon added a comment -

          The description of the JIRA is a bit strange in one aspect: it seems to indicate that we load from one storage directory currently. This is true for the images, but not true for edits - we already handle the case of loading edits from a "merged" view of several directories.

          I'd propose the following:

          1) Separate out a separate JIRA for image-loading failover from edits-loading failover.
          I think the best we can do for image loading is the following:

          • add a try-catch clause around the image loading code. If it fails to load the image, print out an error like:
            "Failed loading image from /foo/bar/fsimage_12342". Please restart the namenode with the "-verifyImageBeforeLoad" flag to search for a different image.
            This new flag would then cause the NN to read through the image, compute its md5, and compare against the stored md5. If the md5 is determined to be invalid, it would skip to a different image.

          2) Use this JIRA to handle recovery on the edit-loading path. Here, the behavior is not to pick some single storage directory to load from, but rather:

          • if we hit a checksum error, try to switch to another underlying JournalManager. If all JournalManagers have been tried for a given txnid, but none of them succeeded, abort.
          • don't worry about semantic errors for now – I think we can handle that in a separate JIRA.
          Show
          Todd Lipcon added a comment - The description of the JIRA is a bit strange in one aspect: it seems to indicate that we load from one storage directory currently. This is true for the images, but not true for edits - we already handle the case of loading edits from a "merged" view of several directories. I'd propose the following: 1) Separate out a separate JIRA for image-loading failover from edits-loading failover. I think the best we can do for image loading is the following: add a try-catch clause around the image loading code. If it fails to load the image, print out an error like: "Failed loading image from /foo/bar/fsimage_12342". Please restart the namenode with the "-verifyImageBeforeLoad" flag to search for a different image. This new flag would then cause the NN to read through the image, compute its md5, and compare against the stored md5. If the md5 is determined to be invalid, it would skip to a different image. 2) Use this JIRA to handle recovery on the edit-loading path. Here, the behavior is not to pick some single storage directory to load from, but rather: if we hit a checksum error, try to switch to another underlying JournalManager. If all JournalManagers have been tried for a given txnid, but none of them succeeded, abort. don't worry about semantic errors for now – I think we can handle that in a separate JIRA.
          Hide
          Colin Patrick McCabe added a comment -
          • implement edit log failover (no image stuff in here)
          Show
          Colin Patrick McCabe added a comment - implement edit log failover (no image stuff in here)
          Hide
          Colin Patrick McCabe added a comment -

          remove references to FSImage (there is now a separate JIRA for that)

          Show
          Colin Patrick McCabe added a comment - remove references to FSImage (there is now a separate JIRA for that)
          Hide
          Colin Patrick McCabe added a comment -

          the same thing for the fsimage

          Show
          Colin Patrick McCabe added a comment - the same thing for the fsimage
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522637/HDFS-3049.001.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
          org.apache.hadoop.hdfs.server.namenode.TestGenericJournalConf

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2276//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2276//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522637/HDFS-3049.001.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader org.apache.hadoop.hdfs.server.namenode.TestGenericJournalConf +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2276//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2276//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • fix tests
          Show
          Colin Patrick McCabe added a comment - fix tests
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522871/HDFS-3049.002.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2284//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2284//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522871/HDFS-3049.002.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2284//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2284//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          rebase on trunk

          Show
          Colin Patrick McCabe added a comment - rebase on trunk
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12523005/HDFS-3049.003.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2289//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2289//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523005/HDFS-3049.003.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2289//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2289//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -
          • can you please add some comments to the code explaining the behavior of getInputStream? eg perhaps a comment above the definition of "best" saying "the set of streams which all contain the same number of transactions, starting from this point".
          • I think some of the new INFO logs are better off being debug
          • Am I correct in understanding that MergedEditLogInputStream only provides failover functionality between multiple logs which contain the same exact set of transactions? In that case, I think a better name is something like FailoverEditLogInputStream. Also, I think you should iterate through the streams in the constructor and verify that they all have matching metadata (version, first tx, last tx, length), throwing an IllegalArgumentException if not.
          • You can use MultipleIOException to group all the exception causes together
          • The new class needs an interface audience annotation (Private) - or can it be made package-private?
          • If you have an error on the very first transaction read, then needSkip gets set true but prevTxId is still INVALID_TXID right? Maybe instead initialize prevTxId to firstTxId-1? Or make the variable be nextTxId?
          • Can you add some comments to nextOp()? I am having trouble following the control flow – why don't you just do the "skipping" in the error handling path instead of using this "needSkip" flag?
          • In the exceptions you throw, you should include the name of the underlying stream
          • the debug message "now trying to read from" should probably be combined with the "got error" message above.
          • Is it really worth adding a new parameter to MiniDFSCluster for this, instead of just manually setting the name/edit dir configs for the few test cases where we need to ensure only a single dir?
          • Refactor out the code to corrupt a log file, since it now appears multiple times, and will make the test easier to digest
          • No need to catch the exception and fail() in the test case - if you just let the exception fall through, it will fail on its own
          +        public boolean accept(File dir, String name) {
          +          if (name.startsWith(NNStorage.getFinalizedEditsFileName(startErrorTxId, 
          +                                  endErrorTxId))) {
          +            return true;
          +          }
          +          return false;
          +        }
          

          why does this have to be so complicated? How would we ever have more than one file starting with edits_N-M in the directory? i.e. why not just do: {File editLog = new File(new File(f1, "current"), NNStorage.getFinalized...)))

          Show
          Todd Lipcon added a comment - can you please add some comments to the code explaining the behavior of getInputStream? eg perhaps a comment above the definition of "best" saying "the set of streams which all contain the same number of transactions, starting from this point". I think some of the new INFO logs are better off being debug Am I correct in understanding that MergedEditLogInputStream only provides failover functionality between multiple logs which contain the same exact set of transactions? In that case, I think a better name is something like FailoverEditLogInputStream. Also, I think you should iterate through the streams in the constructor and verify that they all have matching metadata (version, first tx, last tx, length), throwing an IllegalArgumentException if not. You can use MultipleIOException to group all the exception causes together The new class needs an interface audience annotation (Private) - or can it be made package-private? If you have an error on the very first transaction read, then needSkip gets set true but prevTxId is still INVALID_TXID right? Maybe instead initialize prevTxId to firstTxId-1 ? Or make the variable be nextTxId ? Can you add some comments to nextOp()? I am having trouble following the control flow – why don't you just do the "skipping" in the error handling path instead of using this "needSkip" flag? In the exceptions you throw, you should include the name of the underlying stream the debug message "now trying to read from" should probably be combined with the "got error" message above. Is it really worth adding a new parameter to MiniDFSCluster for this, instead of just manually setting the name/edit dir configs for the few test cases where we need to ensure only a single dir? Refactor out the code to corrupt a log file, since it now appears multiple times, and will make the test easier to digest No need to catch the exception and fail() in the test case - if you just let the exception fall through, it will fail on its own + public boolean accept(File dir, String name) { + if (name.startsWith(NNStorage.getFinalizedEditsFileName(startErrorTxId, + endErrorTxId))) { + return true ; + } + return false ; + } why does this have to be so complicated? How would we ever have more than one file starting with edits_N-M in the directory? i.e. why not just do: { File editLog = new File(new File(f1, "current"), NNStorage.getFinalized...)))
          Hide
          Colin Patrick McCabe added a comment -
          • patch against 3335
          Show
          Colin Patrick McCabe added a comment - patch against 3335
          Hide
          Colin Patrick McCabe added a comment -
          Show
          Colin Patrick McCabe added a comment - rebase patch against HDFS-3335
          Hide
          Colin Patrick McCabe added a comment -
          • FSEditLogLoader#validateEditLog: update comment. We're not calculating numTransactions any more.
          • FSImage#loadFSImage: update comment. We no longer load the edits from the same directory as the image (or at least, not always.) We haven't done so for a long time.
          • FileJournalManager#getInputStream: there's no need to call skipUntil unless we need to go forward in the log to get to our starting txid.
          • FileJournalManager#recoverUnfinalizedSegments: we need the logic to take care of empty edit log segments. The problem is that the code assumes that an edit log that starts at X and ends at Y contains the _inclusive_ range [X, Y]. So we can't really represent the concept of an empty edit log internally. Rather than deleting these files, just move them aside, to be extra careful.
          • fix TestFSEditLog. Add a test of validating an empty edit log.
          Show
          Colin Patrick McCabe added a comment - rebase patch on HDFS-3335 FSEditLogLoader#validateEditLog: update comment. We're not calculating numTransactions any more. FSImage#loadFSImage: update comment. We no longer load the edits from the same directory as the image (or at least, not always.) We haven't done so for a long time. FileJournalManager#getInputStream: there's no need to call skipUntil unless we need to go forward in the log to get to our starting txid. FileJournalManager#recoverUnfinalizedSegments: we need the logic to take care of empty edit log segments. The problem is that the code assumes that an edit log that starts at X and ends at Y contains the _ inclusive _ range [X, Y] . So we can't really represent the concept of an empty edit log internally. Rather than deleting these files, just move them aside, to be extra careful. fix TestFSEditLog. Add a test of validating an empty edit log.
          Hide
          Todd Lipcon added a comment -
          +        FSImage.LOG.info("Caught exception after reading " + numValid +
          +            " ops from " + in + " while determining its valid length.", t);
          

          this should probably be a WARN.

          I think this section also deserves a comment, or an adjustment to the method javadoc, explaining that, even if there are invalid transactions in the middle of the stream, it will skip over them, and return a result indicating the first and last transaction.

          The warning should also include the position within the log. If possible, after the resync() call, include the position it resynchronized to in another message.


          +        if (elis == null) {
          +          LOG.warn("No input stream found for JournalManager " + candidate);
          +          continue;
          +        }
          

          Should include the transaction ID it's looking for


                 } catch (CorruptionException ce) {
          -        corruption = ce;
          +        corruptions.add(ce);
          

          Should LOG.warn here


          +        LOG.warn("Unable to read input stream from JournalManager " +
          +          candidate, ioe);
          

          again, need the txid


          • RedundantEditLogInputStream is missing a license header
          • Does it need to be public? If so, add interface audience annotation (Private)
            +  private int cur;
            

            A more verbose name would be easier to understand - eg curStreamIndex or curIndex or curIdx if you want to be terse.


          • Can you mark the streams[] member final? I believe our style is also to put the [] with the type name - i.e EditLogInputStream[] streams;
          • You reference MergedEditLogInputStream in some of the comments - probably an old name of this class
          • maybe best to change those asserts to actual Preconditions.checkArgument checks, since this isn't performance-critical code
          • style: use // style comments for inline comments in methods
            +        long la = a.getLastTxId();
            +        long lb = b.getLastTxId();
            +        if (la < lb) {
            +          return 1;
            +        } else if (la > lb) {
            +          return -1;
            +        } else {
            +          return 0;
            +        }
            

            can be more simply written as return Longs.compare(b.getLastTxId(), a.getLastTxId());

          • Also, I would recommend copying the streams into the local array first, and then sorting that array, rather than mutating the list argument
          • Can you add doc on the State enum in this class? i.e what are the state transitions, etc?
          • I don't see where you reset cur back to 0 if you "wrap around". Don't you need to do cur %= streams.length; somewhere? What would happen if you had two copies of edits_1-100, where one had every odd edit corrupted, and the other had every even edit corrupted. Are you meant to handle this case?
          Show
          Todd Lipcon added a comment - + FSImage.LOG.info( "Caught exception after reading " + numValid + + " ops from " + in + " while determining its valid length." , t); this should probably be a WARN. I think this section also deserves a comment, or an adjustment to the method javadoc, explaining that, even if there are invalid transactions in the middle of the stream, it will skip over them, and return a result indicating the first and last transaction. The warning should also include the position within the log. If possible, after the resync() call, include the position it resynchronized to in another message. + if (elis == null ) { + LOG.warn( "No input stream found for JournalManager " + candidate); + continue ; + } Should include the transaction ID it's looking for } catch (CorruptionException ce) { - corruption = ce; + corruptions.add(ce); Should LOG.warn here + LOG.warn( "Unable to read input stream from JournalManager " + + candidate, ioe); again, need the txid RedundantEditLogInputStream is missing a license header Does it need to be public? If so, add interface audience annotation (Private) + private int cur; A more verbose name would be easier to understand - eg curStreamIndex or curIndex or curIdx if you want to be terse. Can you mark the streams[] member final? I believe our style is also to put the [] with the type name - i.e EditLogInputStream[] streams; You reference MergedEditLogInputStream in some of the comments - probably an old name of this class maybe best to change those asserts to actual Preconditions.checkArgument checks, since this isn't performance-critical code style: use // style comments for inline comments in methods + long la = a.getLastTxId(); + long lb = b.getLastTxId(); + if (la < lb) { + return 1; + } else if (la > lb) { + return -1; + } else { + return 0; + } can be more simply written as return Longs.compare(b.getLastTxId(), a.getLastTxId()); Also, I would recommend copying the streams into the local array first, and then sorting that array, rather than mutating the list argument Can you add doc on the State enum in this class? i.e what are the state transitions, etc? I don't see where you reset cur back to 0 if you "wrap around". Don't you need to do cur %= streams.length; somewhere? What would happen if you had two copies of edits_1-100, where one had every odd edit corrupted, and the other had every even edit corrupted. Are you meant to handle this case?
          Hide
          Colin Patrick McCabe added a comment -

          I don't see where you reset cur back to 0 if you "wrap around". Don't you need to do cur %= streams.length; somewhere? What would happen if you had two copies of edits_1-100, where one had every odd edit corrupted, and the other had every even edit corrupted. Are you meant to handle this case?

          This patch is not meant to handle that. I think it's very do-able, but it would require some thought-- and perhaps an additional state or two in the state machine. I guess I'll file a JIRA for that. It probably makes sense to do that work as part of fixing the selectInputStreams API.

          Show
          Colin Patrick McCabe added a comment - I don't see where you reset cur back to 0 if you "wrap around". Don't you need to do cur %= streams.length; somewhere? What would happen if you had two copies of edits_1-100, where one had every odd edit corrupted, and the other had every even edit corrupted. Are you meant to handle this case? This patch is not meant to handle that. I think it's very do-able, but it would require some thought-- and perhaps an additional state or two in the state machine. I guess I'll file a JIRA for that. It probably makes sense to do that work as part of fixing the selectInputStreams API.
          Hide
          Colin Patrick McCabe added a comment -

          The new patch:

          • is now based off of trunk
          • fixes some style issues with RedundantEditLogInputStream
          • adds docs for RedundantEditLogInputStream#State
          • log the txid we were looking for when we encountered various exceptions
          • don't assume that all edit logs in the RedundantEditLogInputStream have the same version (always call in.getVersion() to find out what the version of the edit log you're reading is, never cache this value.)
          • EditLogInputStream#nextValidOp: catch RuntimeException as well as IOException. Although EditLogFileInputStream#nextOp is not supposed to throw RuntimeException, it never hurts to be safe.
          • Some test fixes
          • Add TestFSEditLogLoader#testValidateEmptyEditLog
          Show
          Colin Patrick McCabe added a comment - The new patch: is now based off of trunk fixes some style issues with RedundantEditLogInputStream adds docs for RedundantEditLogInputStream#State log the txid we were looking for when we encountered various exceptions don't assume that all edit logs in the RedundantEditLogInputStream have the same version (always call in.getVersion() to find out what the version of the edit log you're reading is, never cache this value.) EditLogInputStream#nextValidOp: catch RuntimeException as well as IOException. Although EditLogFileInputStream#nextOp is not supposed to throw RuntimeException, it never hurts to be safe. Some test fixes Add TestFSEditLogLoader#testValidateEmptyEditLog
          Hide
          Colin Patrick McCabe added a comment -

          Let's see if this text box can take what I'm dishing out. As you know, the attachment system is still broken.

          diff --git hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java
          index 9d070d9..07dd179 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java
          @@ -79,12 +79,12 @@ class BookKeeperEditLogInputStream extends EditLogInputStream {
             }
           
             @Override
          -  public long getFirstTxId() throws IOException {
          +  public long getFirstTxId() {
               return firstTxId;
             }
           
             @Override
          -  public long getLastTxId() throws IOException {
          +  public long getLastTxId() {
               return lastTxId;
             }
             
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
          index 047efd5..f00f0c2 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
          @@ -332,7 +332,6 @@ public class BookKeeperJournalManager implements JournalManager {
             }
           
             // TODO(HA): Handle inProgressOk
          -  @Override
             public long getNumberOfTransactions(long fromTxnId, boolean inProgressOk)
                 throws IOException {
               long count = 0;
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java
          index 41f0292..a46f9cf 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java
          @@ -34,6 +34,6 @@ public class FSEditLogTestUtil {
             public static long countTransactionsInStream(EditLogInputStream in) 
                 throws IOException {
               FSEditLogLoader.EditLogValidation validation = FSEditLogLoader.validateEditLog(in);
          -    return validation.getNumTransactions();
          +    return (validation.getEndTxId() - in.getFirstTxId()) + 1;
             }
           }
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          index a9aa20d..51e2728 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          @@ -207,7 +207,7 @@ public class BackupImage extends FSImage {
                 int logVersion = storage.getLayoutVersion();
                 backupInputStream.setBytes(data, logVersion);
           
          -      long numTxnsAdvanced = logLoader.loadEditRecords(logVersion, 
          +      long numTxnsAdvanced = logLoader.loadEditRecords(
                     backupInputStream, true, lastAppliedTxId + 1, null);
                 if (numTxnsAdvanced != numTxns) {
                   throw new IOException("Batch of txns starting at txnid " +
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          index ebf4f48..97d93f8 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          @@ -60,14 +60,6 @@ class BackupJournalManager implements JournalManager {
             }
           
             @Override
          -  public long getNumberOfTransactions(long fromTxnId, boolean inProgressOk)
          -      throws IOException, CorruptionException {
          -    // This JournalManager is never used for input. Therefore it cannot
          -    // return any transactions
          -    return 0;
          -  }
          -  
          -  @Override
             public EditLogInputStream getInputStream(long fromTxnId, boolean inProgressOk)
                 throws IOException {
               // This JournalManager is never used for input. Therefore it cannot
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java
          index 1f514cd..e8747ff 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java
          @@ -129,12 +129,12 @@ class EditLogBackupInputStream extends EditLogInputStream {
             }
           
             @Override
          -  public long getFirstTxId() throws IOException {
          +  public long getFirstTxId() {
               return HdfsConstants.INVALID_TXID;
             }
           
             @Override
          -  public long getLastTxId() throws IOException {
          +  public long getLastTxId() {
               return HdfsConstants.INVALID_TXID;
             }
           
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
          index 29c90e9..53f1d72 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
          @@ -90,12 +90,12 @@ public class EditLogFileInputStream extends EditLogInputStream {
             }
           
             @Override
          -  public long getFirstTxId() throws IOException {
          +  public long getFirstTxId() {
               return firstTxId;
             }
             
             @Override
          -  public long getLastTxId() throws IOException {
          +  public long getLastTxId() {
               return lastTxId;
             }
           
          @@ -186,7 +186,7 @@ public class EditLogFileInputStream extends EditLogInputStream {
                 FSImage.LOG.warn("Log at " + file + " has no valid header",
                     corrupt);
                 return new FSEditLogLoader.EditLogValidation(0,
          -          HdfsConstants.INVALID_TXID, HdfsConstants.INVALID_TXID, true);
          +          HdfsConstants.INVALID_TXID, true);
               }
               
               try {
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java
          index c2b42be..9504d68 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java
          @@ -45,12 +45,12 @@ public abstract class EditLogInputStream implements Closeable {
             /** 
              * @return the first transaction which will be found in this stream
              */
          -  public abstract long getFirstTxId() throws IOException;
          +  public abstract long getFirstTxId();
             
             /** 
              * @return the last transaction which will be found in this stream
              */
          -  public abstract long getLastTxId() throws IOException;
          +  public abstract long getLastTxId();
           
           
             /**
          @@ -73,14 +73,14 @@ public abstract class EditLogInputStream implements Closeable {
               }
               return nextOp();
             }
          -
          +  
             /** 
              * Position the stream so that a valid operation can be read from it with
              * readOp().
              * 
              * This method can be used to skip over corrupted sections of edit logs.
              */
          -  public void resync() throws IOException {
          +  public void resync() {
               if (cachedOp != null) {
                 return;
               }
          @@ -109,6 +109,8 @@ public abstract class EditLogInputStream implements Closeable {
               // error recovery will want to override this.
               try {
                 return nextOp();
          +    } catch (RuntimeException e) {
          +      return null;
               } catch (IOException e) {
                 return null;
               }
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          index a48e5a6..a86cc79 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          @@ -90,7 +90,7 @@ public class FSEditLogLoader {
               fsNamesys.writeLock();
               try {
                 long startTime = now();
          -      long numEdits = loadEditRecords(logVersion, edits, false, 
          +      long numEdits = loadEditRecords(edits, false, 
                                            expectedStartingTxId, recovery);
                 FSImage.LOG.info("Edits file " + edits.getName() 
                     + " of size " + edits.length() + " edits # " + numEdits 
          @@ -102,7 +102,7 @@ public class FSEditLogLoader {
               }
             }
           
          -  long loadEditRecords(int logVersion, EditLogInputStream in, boolean closeOnExit,
          +  long loadEditRecords(EditLogInputStream in, boolean closeOnExit,
                                 long expectedStartingTxId, MetaRecoveryContext recovery)
                 throws IOException {
               FSDirectory fsDir = fsNamesys.dir;
          @@ -141,7 +141,7 @@ public class FSEditLogLoader {
                       }
                     } catch (Throwable e) {
                       // Handle a problem with our input
          -            check203UpgradeFailure(logVersion, e);
          +            check203UpgradeFailure(in.getVersion(), e);
                       String errorMessage =
                         formatEditLogReplayError(in, recentOpcodeOffsets, expectedTxId);
                       FSImage.LOG.error(errorMessage, e);
          @@ -158,7 +158,7 @@ public class FSEditLogLoader {
                     }
                     recentOpcodeOffsets[(int)(numEdits % recentOpcodeOffsets.length)] =
                       in.getPosition();
          -          if (LayoutVersion.supports(Feature.STORED_TXIDS, logVersion)) {
          +          if (op.hasTransactionId()) {
                       if (op.getTransactionId() > expectedTxId) { 
                         MetaRecoveryContext.editLogLoaderPrompt("There appears " +
                             "to be a gap in the edit log.  We expected txid " +
          @@ -175,7 +175,7 @@ public class FSEditLogLoader {
                       }
                     }
                     try {
          -            applyEditLogOp(op, fsDir, logVersion);
          +            applyEditLogOp(op, fsDir, in.getVersion());
                     } catch (Throwable e) {
                       LOG.error("Encountered exception on operation " + op, e);
                       MetaRecoveryContext.editLogLoaderPrompt("Failed to " +
          @@ -192,7 +192,7 @@ public class FSEditLogLoader {
                       expectedTxId = lastAppliedTxId = expectedStartingTxId;
                     }
                     // log progress
          -          if (LayoutVersion.supports(Feature.STORED_TXIDS, logVersion)) {
          +          if (op.hasTransactionId()) {
                       long now = now();
                       if (now - lastLogTime > REPLAY_TRANSACTION_LOG_INTERVAL) {
                         int percent = Math.round((float)lastAppliedTxId / numTxns * 100);
          @@ -647,76 +647,59 @@ public class FSEditLogLoader {
             }
             
             /**
          -   * Return the number of valid transactions in the stream. If the stream is
          -   * truncated during the header, returns a value indicating that there are
          -   * 0 valid transactions. This reads through the stream but does not close
          -   * it.
          +   * Find the last valid transaction ID in the stream.
          +   * If there are invalid or corrupt transactions in the middle of the stream,
          +   * validateEditLog will skip over them.
          +   * This reads through the stream but does not close it.
          +   *
              * @throws IOException if the stream cannot be read due to an IO error (eg
              *                     if the log does not exist)
              */
             static EditLogValidation validateEditLog(EditLogInputStream in) {
               long lastPos = 0;
          -    long firstTxId = HdfsConstants.INVALID_TXID;
               long lastTxId = HdfsConstants.INVALID_TXID;
               long numValid = 0;
          -    try {
          -      FSEditLogOp op = null;
          -      while (true) {
          -        lastPos = in.getPosition();
          +    FSEditLogOp op = null;
          +    while (true) {
          +      lastPos = in.getPosition();
          +      try {
                   if ((op = in.readOp()) == null) {
                     break;
                   }
          -        if (firstTxId == HdfsConstants.INVALID_TXID) {
          -          firstTxId = op.getTransactionId();
          -        }
          -        if (lastTxId == HdfsConstants.INVALID_TXID
          -            || op.getTransactionId() == lastTxId + 1) {
          -          lastTxId = op.getTransactionId();
          -        } else {
          -          FSImage.LOG.error("Out of order txid found. Found " +
          -            op.getTransactionId() + ", expected " + (lastTxId + 1));
          -          break;
          -        }
          -        numValid++;
          +      } catch (Throwable t) {
          +        FSImage.LOG.warn("Caught exception after reading " + numValid +
          +            " ops from " + in + " while determining its valid length." +
          +            "Position was " + lastPos, t);
          +        in.resync();
          +        FSImage.LOG.warn("After resync, position is " + in.getPosition());
          +        continue;
          +      }
          +      if (lastTxId == HdfsConstants.INVALID_TXID
          +          || op.getTransactionId() > lastTxId) {
          +        lastTxId = op.getTransactionId();
                 }
          -    } catch (Throwable t) {
          -      // Catch Throwable and not just IOE, since bad edits may generate
          -      // NumberFormatExceptions, AssertionErrors, OutOfMemoryErrors, etc.
          -      FSImage.LOG.debug("Caught exception after reading " + numValid +
          -          " ops from " + in + " while determining its valid length.", t);
          +      numValid++;
               }
          -    return new EditLogValidation(lastPos, firstTxId, lastTxId, false);
          +    return new EditLogValidation(lastPos, lastTxId, false);
             }
          -  
          +
             static class EditLogValidation {
               private final long validLength;
          -    private final long startTxId;
               private final long endTxId;
          -    private final boolean corruptionDetected;
          -     
          -    EditLogValidation(long validLength, long startTxId, long endTxId,
          -        boolean corruptionDetected) {
          +    private final boolean hasCorruptHeader;
          +
          +    EditLogValidation(long validLength, long endTxId,
          +        boolean hasCorruptHeader) {
                 this.validLength = validLength;
          -      this.startTxId = startTxId;
                 this.endTxId = endTxId;
          -      this.corruptionDetected = corruptionDetected;
          +      this.hasCorruptHeader = hasCorruptHeader;
               }
          -    
          +
               long getValidLength() { return validLength; }
          -    
          -    long getStartTxId() { return startTxId; }
          -    
          +
               long getEndTxId() { return endTxId; }
          -    
          -    long getNumTransactions() { 
          -      if (endTxId == HdfsConstants.INVALID_TXID
          -          || startTxId == HdfsConstants.INVALID_TXID) {
          -        return 0;
          -      }
          -      return (endTxId - startTxId) + 1;
          -    }
          -    
          -    boolean hasCorruptHeader() { return corruptionDetected; }
          +
          +    boolean hasCorruptHeader() { return hasCorruptHeader; }
             }
           
             /**
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          index c5d4195..261e71a 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          @@ -561,7 +561,7 @@ public class FSImage implements Closeable {
           
             /**
              * Choose latest image from one of the directories,
          -   * load it and merge with the edits from that directory.
          +   * load it and merge with the edits.
              * 
              * Saving and loading fsimage should never trigger symlink resolution. 
              * The paths that are persisted do not have *intermediate* symlinks 
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          index 3767111..c7b9f5c 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          @@ -232,10 +232,10 @@ class FileJournalManager implements JournalManager {
                     LOG.info(String.format("Log begins at txid %d, but requested start "
                         + "txid is %d. Skipping %d edits.", elf.getFirstTxId(), fromTxId,
                         transactionsToSkip));
          -        }
          -        if (elfis.skipUntil(fromTxId) == false) {
          -          throw new IOException("failed to advance input stream to txid " +
          -              fromTxId);
          +          if (elfis.skipUntil(fromTxId) == false) {
          +            throw new IOException("failed to advance input stream to txid " +
          +                fromTxId);
          +          }
                   }
                   return elfis;
                 }
          @@ -245,60 +245,6 @@ class FileJournalManager implements JournalManager {
             }
           
             @Override
          -  public long getNumberOfTransactions(long fromTxId, boolean inProgressOk)
          -      throws IOException, CorruptionException {
          -    long numTxns = 0L;
          -    
          -    for (EditLogFile elf : getLogFiles(fromTxId)) {
          -      if (LOG.isTraceEnabled()) {
          -        LOG.trace("Counting " + elf);
          -      }
          -      if (elf.getFirstTxId() > fromTxId) { // there must be a gap
          -        LOG.warn("Gap in transactions in " + sd.getRoot() + ". Gap is "
          -            + fromTxId + " - " + (elf.getFirstTxId() - 1));
          -        break;
          -      } else if (elf.containsTxId(fromTxId)) {
          -        if (!inProgressOk && elf.isInProgress()) {
          -          break;
          -        }
          -        
          -        if (elf.isInProgress()) {
          -          elf.validateLog();
          -        } 
          -
          -        if (elf.hasCorruptHeader()) {
          -          break;
          -        }
          -        numTxns += elf.getLastTxId() + 1 - fromTxId;
          -        fromTxId = elf.getLastTxId() + 1;
          -        
          -        if (elf.isInProgress()) {
          -          break;
          -        }
          -      }
          -    }
          -
          -    if (LOG.isDebugEnabled()) {
          -      LOG.debug("Journal " + this + " has " + numTxns 
          -                + " txns from " + fromTxId);
          -    }
          -
          -    long max = findMaxTransaction(inProgressOk);
          -    
          -    // fromTxId should be greater than max, as it points to the next 
          -    // transaction we should expect to find. If it is less than or equal
          -    // to max, it means that a transaction with txid == max has not been found
          -    if (numTxns == 0 && fromTxId <= max) { 
          -      String error = String.format("Gap in transactions, max txnid is %d"
          -                                   + ", 0 txns from %d", max, fromTxId);
          -      LOG.error(error);
          -      throw new CorruptionException(error);
          -    }
          -
          -    return numTxns;
          -  }
          -
          -  @Override
             synchronized public void recoverUnfinalizedSegments() throws IOException {
               File currentDir = sd.getCurrentDir();
               LOG.info("Recovering unfinalized segments in " + currentDir);
          @@ -318,7 +264,7 @@ class FileJournalManager implements JournalManager {
                     }
                     continue;
                   }
          -        
          +
                   elf.validateLog();
           
                   if (elf.hasCorruptHeader()) {
          @@ -326,19 +272,16 @@ class FileJournalManager implements JournalManager {
                     throw new CorruptionException("In-progress edit log file is corrupt: "
                         + elf);
                   }
          -        
          -        // If the file has a valid header (isn't corrupt) but contains no
          -        // transactions, we likely just crashed after opening the file and
          -        // writing the header, but before syncing any transactions. Safe to
          -        // delete the file.
          -        if (elf.getNumTransactions() == 0) {
          -          LOG.info("Deleting edit log file with zero transactions " + elf);
          -          if (!elf.getFile().delete()) {
          -            throw new IOException("Unable to delete " + elf.getFile());
          -          }
          +        if (elf.getLastTxId() == HdfsConstants.INVALID_TXID) {
          +          // If the file has a valid header (isn't corrupt) but contains no
          +          // transactions, we likely just crashed after opening the file and
          +          // writing the header, but before syncing any transactions. Safe to
          +          // delete the file.
          +          LOG.error("Moving aside edit log file that seems to have zero " +
          +              "transactions " + elf);
          +          elf.moveAsideEmptyFile();
                     continue;
                   }
          -        
                   finalizeLogSegment(elf.getFirstTxId(), elf.getLastTxId());
                 }
               }
          @@ -361,39 +304,6 @@ class FileJournalManager implements JournalManager {
               return logFiles;
             }
           
          -  /** 
          -   * Find the maximum transaction in the journal.
          -   */
          -  private long findMaxTransaction(boolean inProgressOk)
          -      throws IOException {
          -    boolean considerSeenTxId = true;
          -    long seenTxId = NNStorage.readTransactionIdFile(sd);
          -    long maxSeenTransaction = 0;
          -    for (EditLogFile elf : getLogFiles(0)) {
          -      if (elf.isInProgress() && !inProgressOk) {
          -        if (elf.getFirstTxId() != HdfsConstants.INVALID_TXID &&
          -            elf.getFirstTxId() <= seenTxId) {
          -          // don't look at the seen_txid file if in-progress logs are not to be
          -          // examined, and the value in seen_txid falls within the in-progress
          -          // segment.
          -          considerSeenTxId = false;
          -        }
          -        continue;
          -      }
          -      
          -      if (elf.isInProgress()) {
          -        maxSeenTransaction = Math.max(elf.getFirstTxId(), maxSeenTransaction);
          -        elf.validateLog();
          -      }
          -      maxSeenTransaction = Math.max(elf.getLastTxId(), maxSeenTransaction);
          -    }
          -    if (considerSeenTxId) {
          -      return Math.max(maxSeenTransaction, seenTxId);
          -    } else {
          -      return maxSeenTransaction;
          -    }
          -  }
          -
             @Override
             public String toString() {
               return String.format("FileJournalManager(root=%s)", sd.getRoot());
          @@ -406,7 +316,6 @@ class FileJournalManager implements JournalManager {
               private File file;
               private final long firstTxId;
               private long lastTxId;
          -    private long numTx = -1;
           
               private boolean hasCorruptHeader = false;
               private final boolean isInProgress;
          @@ -454,20 +363,15 @@ class FileJournalManager implements JournalManager {
               }
           
               /** 
          -     * Count the number of valid transactions in a log.
          +     * Find out where the edit log ends.
                * This will update the lastTxId of the EditLogFile or
                * mark it as corrupt if it is.
                */
               void validateLog() throws IOException {
                 EditLogValidation val = EditLogFileInputStream.validateEditLog(file);
          -      this.numTx = val.getNumTransactions();
                 this.lastTxId = val.getEndTxId();
                 this.hasCorruptHeader = val.hasCorruptHeader();
               }
          -    
          -    long getNumTransactions() {
          -      return numTx;
          -    }
           
               boolean isInProgress() {
                 return isInProgress;
          @@ -483,23 +387,31 @@ class FileJournalManager implements JournalManager {
           
               void moveAsideCorruptFile() throws IOException {
                 assert hasCorruptHeader;
          -    
          +      renameSelf(".corrupt");
          +    }
          +
          +    void moveAsideEmptyFile() throws IOException {
          +      assert lastTxId == HdfsConstants.INVALID_TXID;
          +      renameSelf(".empty");
          +    }
          +      
          +    private void renameSelf(String newSuffix) throws IOException {
                 File src = file;
          -      File dst = new File(src.getParent(), src.getName() + ".corrupt");
          +      File dst = new File(src.getParent(), src.getName() + newSuffix);
                 boolean success = src.renameTo(dst);
                 if (!success) {
                   throw new IOException(
          -          "Couldn't rename corrupt log " + src + " to " + dst);
          +          "Couldn't rename log " + src + " to " + dst);
                 }
                 file = dst;
               }
          -    
          +
               @Override
               public String toString() {
                 return String.format("EditLogFile(file=%s,first=%019d,last=%019d,"
          -                           +"inProgress=%b,hasCorruptHeader=%b,numTx=%d)",
          +                           +"inProgress=%b,hasCorruptHeader=%b)",
                                      file.toString(), firstTxId, lastTxId,
          -                           isInProgress(), hasCorruptHeader, numTx);
          +                           isInProgress(), hasCorruptHeader);
               }
             }
           }
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          index f9c622d..390c38c 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          @@ -56,18 +56,6 @@ public interface JournalManager extends Closeable {
               throws IOException;
           
             /**
          -   * Get the number of transaction contiguously available from fromTxnId.
          -   *
          -   * @param fromTxnId Transaction id to count from
          -   * @param inProgressOk whether or not in-progress streams should be counted
          -   * @return The number of transactions available from fromTxnId
          -   * @throws IOException if the journal cannot be read.
          -   * @throws CorruptionException if there is a gap in the journal at fromTxnId.
          -   */
          -  long getNumberOfTransactions(long fromTxnId, boolean inProgressOk)
          -      throws IOException, CorruptionException;
          -
          -  /**
              * Set the amount of memory that this stream should use to buffer edits
              */
             void setOutputBufferCapacity(int size);
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
          index d84d79d..391304d 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
          @@ -20,6 +20,8 @@ package org.apache.hadoop.hdfs.server.namenode;
           import java.io.IOException;
           import java.util.ArrayList;
           import java.util.Collections;
          +import java.util.Comparator;
          +import java.util.LinkedList;
           import java.util.List;
           import java.util.SortedSet;
           
          @@ -28,6 +30,7 @@ import org.apache.commons.logging.LogFactory;
           import org.apache.hadoop.classification.InterfaceAudience;
           import org.apache.hadoop.hdfs.server.protocol.RemoteEditLog;
           import org.apache.hadoop.hdfs.server.protocol.RemoteEditLogManifest;
          +import org.apache.hadoop.io.MultipleIOException;
           
           import com.google.common.annotations.VisibleForTesting;
           import com.google.common.base.Preconditions;
          @@ -193,76 +196,67 @@ public class JournalSet implements JournalManager {
                 }
               }, "close journal");
             }
          -
             
             /**
          -   * Find the best editlog input stream to read from txid.
          -   * If a journal throws an CorruptionException while reading from a txn id,
          -   * it means that it has more transactions, but can't find any from fromTxId. 
          -   * If this is the case and no other journal has transactions, we should throw
          -   * an exception as it means more transactions exist, we just can't load them.
          +   * Get an input stream that can supply transactions starting at txid.
          +   * If there are multiple edit logs present, we will merge them into a single
          +   * RedundantEditLogInputStream.
          +   * 
          +   * If the only edit logs available starting at txid are corrupt, we'll throw an
          +   * IOException.  This means that more transactions exist, but we can't load
          +   * them.
              *
              * @param fromTxnId Transaction id to start from.
          -   * @return A edit log input stream with tranactions fromTxId 
          -   *         or null if no more exist
          +   * @return An edit log input stream with tranactions fromTxId, or null if 
          +   * there are no files which cover this range.
              */
             @Override
             public EditLogInputStream getInputStream(long fromTxnId, boolean inProgressOk)
                 throws IOException {
          -    JournalManager bestjm = null;
          -    long bestjmNumTxns = 0;
          -    CorruptionException corruption = null;
          +    LinkedList<EditLogInputStream> streams = new LinkedList<EditLogInputStream>();
          +    LinkedList<IOException> corruptions = new LinkedList<IOException>();
           
               for (JournalAndStream jas : journals) {
                 if (jas.isDisabled()) continue;
          -      
          +
                 JournalManager candidate = jas.getManager();
                 long candidateNumTxns = 0;
          +      EditLogInputStream elis;
                 try {
          -        candidateNumTxns = candidate.getNumberOfTransactions(fromTxnId,
          -            inProgressOk);
          +        elis = candidate.getInputStream(fromTxnId, inProgressOk);
          +        if (elis == null) {
          +          LOG.warn("No input stream found for JournalManager " + candidate +
          +            ", txid " + fromTxnId);
          +          continue;
          +        }
                 } catch (CorruptionException ce) {
          -        corruption = ce;
          +        LOG.warn("JournalManager " + candidate + " encountered a " +
          +            "CorruptionException while looking for txid " + fromTxnId, ce);
          +        corruptions.add(ce);
          +        continue;
                 } catch (IOException ioe) {
          -        LOG.warn("Unable to read input streams from JournalManager " + candidate,
          -            ioe);
          -        continue; // error reading disk, just skip
          -      }
          -      
          -      if (candidateNumTxns > bestjmNumTxns) {
          -        bestjm = candidate;
          -        bestjmNumTxns = candidateNumTxns;
          +        LOG.warn("Unable to read input stream from JournalManager " +
          +          candidate + ", txid " + fromTxnId, ioe);
          +        continue;
                 }
          +
          +      LOG.info("Examined edit log '" + elis.getName() + "'; it starts at " + 
          +        elis.getFirstTxId() + " and ends at " + elis.getLastTxId() + ".");
          +      streams.add(elis);
               }
          -    
          -    if (bestjm == null) {
          -      if (corruption != null) {
          -        throw new IOException("No non-corrupt logs for txid " 
          -                                        + fromTxnId, corruption);
          -      } else {
          +
          +    if (streams.isEmpty()) {
          +      if (corruptions.isEmpty()) {
                   return null;
          -      }
          -    }
          -    return bestjm.getInputStream(fromTxnId, inProgressOk);
          -  }
          -  
          -  @Override
          -  public long getNumberOfTransactions(long fromTxnId, boolean inProgressOk)
          -      throws IOException {
          -    long num = 0;
          -    for (JournalAndStream jas: journals) {
          -      if (jas.isDisabled()) {
          -        LOG.info("Skipping jas " + jas + " since it's disabled");
          -        continue;
                 } else {
          -        long newNum = jas.getManager().getNumberOfTransactions(fromTxnId,
          -            inProgressOk);
          -        if (newNum > num) {
          -          num = newNum;
          -        }
          +        throw new IOException("No non-corrupt logs for txid " + fromTxnId,
          +            MultipleIOException.createIOException(corruptions));
                 }
          +    } else if (streams.size() == 1) {
          +      return streams.get(0);
          +    } else {
          +      return new RedundantEditLogInputStream(streams);
               }
          -    return num;
             }
           
             /**
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java
          new file mode 100644
          index 0000000..63ba506
          --- /dev/null
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java
          @@ -0,0 +1,223 @@
          +package org.apache.hadoop.hdfs.server.namenode;
          +
          +import java.io.IOException;
          +import java.util.Arrays;
          +import java.util.Collections;
          +import java.util.Comparator;
          +import java.util.LinkedList;
          +import java.util.List;
          +
          +import org.apache.commons.lang.StringUtils;
          +import org.apache.commons.logging.Log;
          +import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.hdfs.protocol.HdfsConstants;
          +
          +import com.google.common.base.Preconditions;
          +import com.google.common.primitives.Longs;
          +
          +/**
          + * A merged input stream that handles failover between different edit logs.
          + */
          +public class RedundantEditLogInputStream extends EditLogInputStream {
          +  public static final Log LOG = LogFactory.getLog(EditLogInputStream.class.getName());
          +  private int curIdx;
          +  private long prevTxId;
          +  private final EditLogInputStream[] streams;
          +  /**
          +   * States that the RedundantEditLogInputStream can be in.
          +   */
          +  static private enum State {
          +    /** We need to skip until prevTxId + 1 */
          +    SKIP_UNTIL,
          +    /** We're ready to read opcodes out of the current stream */
          +    OK,
          +    /** The current stream has failed. */
          +    STREAM_FAILED,
          +    /** The current stream has failed, and resync() was called.  */
          +    STREAM_FAILED_RESYNC,
          +    /** There are no more opcodes to read from this
          +     * RedundantEditLogInputStream */
          +    EOF;
          +  }
          +  private State state;
          +  private IOException prevException;
          +
          +  RedundantEditLogInputStream(List<EditLogInputStream> streams) throws IOException {
          +    this.curIdx = 0;
          +    this.prevTxId = HdfsConstants.INVALID_TXID;
          +
          +    if (streams.isEmpty()) {
          +      this.state = State.EOF;
          +    } else{
          +      this.state = State.OK;
          +    }
          +    prevException = null;
          +    // EditLogInputStreams in a RedundantEditLogInputStream must be finalized,
          +    // and can't be pre-transactional.
          +    for (EditLogInputStream s : streams) {
          +      Preconditions.checkArgument(s.getFirstTxId() !=
          +          HdfsConstants.INVALID_TXID);
          +      Preconditions.checkArgument(s.getLastTxId() !=
          +          HdfsConstants.INVALID_TXID);
          +      Preconditions.checkArgument(!s.isInProgress());
          +    }
          +
          +    this.streams = streams.toArray(new EditLogInputStream[0]);
          +    
          +    /* We sort the streams here so that the streams that end later come first.
          +     */
          +    Arrays.sort(this.streams, new Comparator<EditLogInputStream>() {
          +      @Override
          +      public int compare(EditLogInputStream a, EditLogInputStream b) {
          +        return Longs.compare(b.getLastTxId(), a.getLastTxId());
          +      }
          +    });
          +  }
          +  
          +  @Override
          +  public String getName() {
          +    StringBuilder bld = new StringBuilder();
          +    String prefix = "";
          +    for (EditLogInputStream s : streams) {
          +      bld.append(prefix);
          +      bld.append(s.getName());
          +      prefix = ", ";
          +    }
          +    return bld.toString();
          +  }
          +
          +  @Override
          +  public long getFirstTxId() {
          +    return streams[curIdx].getFirstTxId();
          +  }
          +
          +  @Override
          +  public long getLastTxId() {
          +    return streams[curIdx].getLastTxId();
          +  }
          +
          +  @Override
          +  public void close() throws IOException {
          +    LinkedList<Throwable> exceptions = new LinkedList<Throwable>();
          +    for (EditLogInputStream s : streams) {
          +      try {
          +        s.close();
          +      } catch (Throwable t) {
          +        LOG.error(t);
          +        exceptions.add(t);
          +      }
          +    }
          +    if (!exceptions.isEmpty()) {
          +      throw new IOException("errors while closing " +
          +          "EditLogInputStreams: " +
          +          StringUtils.join(exceptions.toArray(), ','));
          +    }
          +  }
          +
          +  @Override
          +  protected FSEditLogOp nextValidOp() {
          +    try {
          +      if (state == State.STREAM_FAILED) {
          +        state = State.STREAM_FAILED_RESYNC;
          +      }
          +      return nextOp();
          +    } catch (IOException e) {
          +      return null;
          +    }
          +  }
          +  
          +  @Override
          +  protected FSEditLogOp nextOp() throws IOException {
          +    while (true) {
          +      switch (state) {
          +      case SKIP_UNTIL:
          +        try {
          +          if (prevTxId != HdfsConstants.INVALID_TXID) {
          +            LOG.info("Fast-forwarding stream '" + streams[curIdx].getName() +
          +                "' to transaction ID " + (prevTxId + 1));
          +            streams[curIdx].skipUntil(prevTxId + 1);
          +          }
          +        } catch (IOException e) {
          +          prevException = e;
          +          state = State.STREAM_FAILED;
          +        }
          +        state = State.OK;
          +        break;
          +      case OK:
          +        try {
          +          FSEditLogOp op = streams[curIdx].readOp();
          +          if (op == null) {
          +            state = State.EOF;
          +            if (streams[curIdx].getLastTxId() == prevTxId) {
          +              return null;
          +            } else {
          +              throw new IOException("got premature end-of-file at txid " +
          +                prevTxId + "; expected file to go up to " +
          +                streams[curIdx].getLastTxId());
          +            }
          +          }
          +          prevTxId = op.getTransactionId();
          +          return op;
          +        } catch (IOException e) {
          +          prevException = e;
          +          state = State.STREAM_FAILED;
          +        }
          +        break;
          +      case STREAM_FAILED:
          +        if (curIdx + 1 == streams.length) {
          +          throw prevException;
          +        }
          +        long oldLast = streams[curIdx].getLastTxId();
          +        long newLast = streams[curIdx + 1].getLastTxId();
          +        if (newLast < oldLast) {
          +          throw new IOException("We encountered an error reading " +
          +              streams[curIdx].getName() + ".  During automatic failover, " +
          +              "we noticed that all of the remaining edit log streams are " +
          +              "shorter than the current one!  The best " + 
          +              "remaining edit log ends at transaction " + 
          +              newLast + ", but we thought we could read up to transaction " +
          +              oldLast + ".  If you continue, metadata will be lost forever!");
          +        }
          +        LOG.error("Got error reading edit log input stream " +
          +          streams[curIdx].getName(), prevException);
          +        LOG.error("failing over to edit log " + 
          +          streams[curIdx + 1].getName());
          +        curIdx++;
          +        state = State.SKIP_UNTIL;
          +        break;
          +      case STREAM_FAILED_RESYNC:
          +        if (curIdx + 1 == streams.length) {
          +          streams[curIdx].resync();
          +        } else {
          +          LOG.error("failing over to edit log " +
          +              streams[curIdx + 1].getName());
          +          curIdx++;
          +        }
          +        state = State.SKIP_UNTIL;
          +        break;
          +      case EOF:
          +        return null;
          +      }
          +    }
          +  }
          +
          +  @Override
          +  public int getVersion() throws IOException {
          +    return streams[curIdx].getVersion();
          +  }
          +
          +  @Override
          +  public long getPosition() {
          +    return streams[curIdx].getPosition();
          +  }
          +
          +  @Override
          +  public long length() throws IOException {
          +    return streams[curIdx].length();
          +  }
          +
          +  @Override
          +  public boolean isInProgress() {
          +    return false;
          +  }
          +}
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          index b418fcf..c396d82 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          @@ -131,6 +131,7 @@ public class MiniDFSCluster {
               private int numDataNodes = 1;
               private boolean format = true;
               private boolean manageNameDfsDirs = true;
          +    private boolean enableManagedDfsDirsRedundancy = true;
               private boolean manageDataDfsDirs = true;
               private StartupOption option = null;
               private String[] racks = null; 
          @@ -184,6 +185,14 @@ public class MiniDFSCluster {
                 this.manageNameDfsDirs = val;
                 return this;
               }
          +    
          +    /**
          +     * Default: true
          +     */
          +    public Builder enableManagedDfsDirsRedundancy(boolean val) {
          +      this.enableManagedDfsDirsRedundancy = val;
          +      return this;
          +    }
           
               /**
                * Default: true
          @@ -286,6 +295,7 @@ public class MiniDFSCluster {
                                  builder.numDataNodes,
                                  builder.format,
                                  builder.manageNameDfsDirs,
          +                       builder.enableManagedDfsDirsRedundancy,
                                  builder.manageDataDfsDirs,
                                  builder.option,
                                  builder.racks,
          @@ -373,7 +383,7 @@ public class MiniDFSCluster {
             public MiniDFSCluster(Configuration conf,
                                   int numDataNodes,
                                   StartupOption nameNodeOperation) throws IOException {
          -    this(0, conf, numDataNodes, false, false, false,  nameNodeOperation, 
          +    this(0, conf, numDataNodes, false, false, false, false,  nameNodeOperation, 
                     null, null, null);
             }
             
          @@ -395,7 +405,8 @@ public class MiniDFSCluster {
                                   int numDataNodes,
                                   boolean format,
                                   String[] racks) throws IOException {
          -    this(0, conf, numDataNodes, format, true, true,  null, racks, null, null);
          +    this(0, conf, numDataNodes, format, true, true, true, null,
          +        racks, null, null);
             }
             
             /**
          @@ -417,7 +428,8 @@ public class MiniDFSCluster {
                                   int numDataNodes,
                                   boolean format,
                                   String[] racks, String[] hosts) throws IOException {
          -    this(0, conf, numDataNodes, format, true, true, null, racks, hosts, null);
          +    this(0, conf, numDataNodes, format, true, true, true, null,
          +        racks, hosts, null);
             }
             
             /**
          @@ -450,8 +462,8 @@ public class MiniDFSCluster {
                                   boolean manageDfsDirs,
                                   StartupOption operation,
                                   String[] racks) throws IOException {
          -    this(nameNodePort, conf, numDataNodes, format, manageDfsDirs, manageDfsDirs,
          -         operation, racks, null, null);
          +    this(nameNodePort, conf, numDataNodes, format, manageDfsDirs,
          +        manageDfsDirs, manageDfsDirs, operation, racks, null, null);
             }
           
             /**
          @@ -485,7 +497,7 @@ public class MiniDFSCluster {
                                   String[] racks,
                                   long[] simulatedCapacities) throws IOException {
               this(nameNodePort, conf, numDataNodes, format, manageDfsDirs, manageDfsDirs,
          -          operation, racks, null, simulatedCapacities);
          +        manageDfsDirs, operation, racks, null, simulatedCapacities);
             }
             
             /**
          @@ -519,13 +531,15 @@ public class MiniDFSCluster {
                                   int numDataNodes,
                                   boolean format,
                                   boolean manageNameDfsDirs,
          +                        boolean enableManagedDfsDirsRedundancy,
                                   boolean manageDataDfsDirs,
                                   StartupOption operation,
                                   String[] racks, String hosts[],
                                   long[] simulatedCapacities) throws IOException {
               this.nameNodes = new NameNodeInfo[1]; // Single namenode in the cluster
               initMiniDFSCluster(conf, numDataNodes, format,
          -        manageNameDfsDirs, manageDataDfsDirs, operation, racks, hosts,
          +        manageNameDfsDirs, enableManagedDfsDirsRedundancy, manageDataDfsDirs,
          +        operation, racks, hosts,
                   simulatedCapacities, null, true, false,
                   MiniDFSNNTopology.simpleSingleNN(nameNodePort, 0));
             }
          @@ -533,6 +547,7 @@ public class MiniDFSCluster {
             private void initMiniDFSCluster(
                 Configuration conf,
                 int numDataNodes, boolean format, boolean manageNameDfsDirs,
          +      boolean enableManagedDfsDirsRedundancy,
                 boolean manageDataDfsDirs, StartupOption operation, String[] racks,
                 String[] hosts, long[] simulatedCapacities, String clusterId,
                 boolean waitSafeMode, boolean setupHostsFile,
          @@ -572,7 +587,8 @@ public class MiniDFSCluster {
               
               federation = nnTopology.isFederated();
               createNameNodesAndSetConf(
          -        nnTopology, manageNameDfsDirs, format, operation, clusterId, conf);
          +        nnTopology, manageNameDfsDirs, enableManagedDfsDirsRedundancy,
          +        format, operation, clusterId, conf);
               
               if (format) {
                 if (data_dir.exists() && !FileUtil.fullyDelete(data_dir)) {
          @@ -593,7 +609,8 @@ public class MiniDFSCluster {
             }
             
             private void createNameNodesAndSetConf(MiniDFSNNTopology nnTopology,
          -      boolean manageNameDfsDirs, boolean format, StartupOption operation,
          +      boolean manageNameDfsDirs, boolean enableManagedDfsDirsRedundancy,
          +      boolean format, StartupOption operation,
                 String clusterId,
                 Configuration conf) throws IOException {
               Preconditions.checkArgument(nnTopology.countNameNodes() > 0,
          @@ -650,7 +667,7 @@ public class MiniDFSCluster {
                 Collection<URI> prevNNDirs = null;
                 int nnCounterForFormat = nnCounter;
                 for (NNConf nn : nameservice.getNNs()) {
          -        initNameNodeConf(conf, nsId, nn.getNnId(), manageNameDfsDirs,
          +        initNameNodeConf(conf, nsId, nn.getNnId(), manageNameDfsDirs, manageNameDfsDirs,
                       nnCounterForFormat);
                   Collection<URI> namespaceDirs = FSNamesystem.getNamespaceDirs(conf);
                   if (format) {
          @@ -682,7 +699,8 @@ public class MiniDFSCluster {
           
                 // Start all Namenodes
                 for (NNConf nn : nameservice.getNNs()) {
          -        initNameNodeConf(conf, nsId, nn.getNnId(), manageNameDfsDirs, nnCounter);
          +        initNameNodeConf(conf, nsId, nn.getNnId(), manageNameDfsDirs,
          +            enableManagedDfsDirsRedundancy, nnCounter);
                   createNameNode(nnCounter++, conf, numDataNodes, false, operation,
                       clusterId, nsId, nn.getNnId());
                 }
          @@ -707,8 +725,8 @@ public class MiniDFSCluster {
           
             private void initNameNodeConf(Configuration conf,
                 String nameserviceId, String nnId,
          -      boolean manageNameDfsDirs, int nnIndex)
          -      throws IOException {
          +      boolean manageNameDfsDirs, boolean enableManagedDfsDirsRedundancy,
          +      int nnIndex) throws IOException {
               if (nameserviceId != null) {
                 conf.set(DFS_FEDERATION_NAMESERVICE_ID, nameserviceId);
               }
          @@ -717,12 +735,21 @@ public class MiniDFSCluster {
               }
               
               if (manageNameDfsDirs) {
          -      conf.set(DFS_NAMENODE_NAME_DIR_KEY,
          -          fileAsURI(new File(base_dir, "name" + (2*nnIndex + 1)))+","+
          -          fileAsURI(new File(base_dir, "name" + (2*nnIndex + 2))));
          -      conf.set(DFS_NAMENODE_CHECKPOINT_DIR_KEY,
          -          fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 1)))+","+
          -          fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 2))));
          +      if (enableManagedDfsDirsRedundancy) {
          +        conf.set(DFS_NAMENODE_NAME_DIR_KEY,
          +            fileAsURI(new File(base_dir, "name" + (2*nnIndex + 1)))+","+
          +            fileAsURI(new File(base_dir, "name" + (2*nnIndex + 2))));
          +        conf.set(DFS_NAMENODE_CHECKPOINT_DIR_KEY,
          +            fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 1)))+","+
          +            fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 2))));
          +      } else {
          +        conf.set(DFS_NAMENODE_NAME_DIR_KEY,
          +            fileAsURI(new File(base_dir, "name" + (2*nnIndex + 1))).
          +              toString());
          +        conf.set(DFS_NAMENODE_CHECKPOINT_DIR_KEY,
          +            fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 1))).
          +              toString());
          +      }
               }
             }
           
          @@ -2118,7 +2145,7 @@ public class MiniDFSCluster {
               String nnId = null;
               initNameNodeAddress(conf, nameserviceId,
                   new NNConf(nnId).setIpcPort(namenodePort));
          -    initNameNodeConf(conf, nameserviceId, nnId, true, nnIndex);
          +    initNameNodeConf(conf, nameserviceId, nnId, true, true, nnIndex);
               createNameNode(nnIndex, conf, numDataNodes, true, null, null,
                   nameserviceId, nnId);
           
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          index cae3b0d..d0466f5 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          @@ -22,6 +22,7 @@ import java.io.*;
           import java.net.URI;
           import java.util.Collection;
           import java.util.Iterator;
          +import java.util.LinkedList;
           import java.util.List;
           import java.util.ArrayList;
           import java.util.Collections;
          @@ -505,21 +506,28 @@ public class TestEditLog extends TestCase {
               FSImage fsimage = namesystem.getFSImage();
               final FSEditLog editLog = fsimage.getEditLog();
               fileSys.mkdirs(new Path("/tmp"));
          -    StorageDirectory sd = fsimage.getStorage().dirIterator(NameNodeDirType.EDITS).next();
          +    Iterator<StorageDirectory> iter = fsimage.getStorage().
          +      dirIterator(NameNodeDirType.EDITS);
          +    LinkedList<StorageDirectory> sds = new LinkedList<StorageDirectory>();
          +    while (iter.hasNext()) {
          +      sds.add(iter.next());
          +    }
               editLog.close();
               cluster.shutdown();
           
          -    File editFile = NNStorage.getFinalizedEditsFile(sd, 1, 3);
          -    assertTrue(editFile.exists());
          -
          -    long fileLen = editFile.length();
          -    System.out.println("File name: " + editFile + " len: " + fileLen);
          -    RandomAccessFile rwf = new RandomAccessFile(editFile, "rw");
          -    rwf.seek(fileLen-4); // seek to checksum bytes
          -    int b = rwf.readInt();
          -    rwf.seek(fileLen-4);
          -    rwf.writeInt(b+1);
          -    rwf.close();
          +    for (StorageDirectory sd : sds) {
          +      File editFile = NNStorage.getFinalizedEditsFile(sd, 1, 3);
          +      assertTrue(editFile.exists());
          +  
          +      long fileLen = editFile.length();
          +      LOG.debug("Corrupting Log File: " + editFile + " len: " + fileLen);
          +      RandomAccessFile rwf = new RandomAccessFile(editFile, "rw");
          +      rwf.seek(fileLen-4); // seek to checksum bytes
          +      int b = rwf.readInt();
          +      rwf.seek(fileLen-4);
          +      rwf.writeInt(b+1);
          +      rwf.close();
          +    }
               
               try {
                 cluster = new MiniDFSCluster.Builder(conf).numDataNodes(NUM_DATA_NODES).format(false).build();
          @@ -739,8 +747,9 @@ public class TestEditLog extends TestCase {
                   throw ioe;
                 } else {
                   GenericTestUtils.assertExceptionContains(
          -            "No non-corrupt logs for txid 3",
          -            ioe);
          +          "Gap in transactions. Expected to be able to read up until " +
          +          "at least txid 3 but unable to find any edit logs containing " +
          +          "txid 3", ioe);
                 }
               } finally {
                 cluster.shutdown();
          @@ -769,12 +778,12 @@ public class TestEditLog extends TestCase {
               }
             
               @Override
          -    public long getFirstTxId() throws IOException {
          +    public long getFirstTxId() {
                 return HdfsConstants.INVALID_TXID;
               }
               
               @Override
          -    public long getLastTxId() throws IOException {
          +    public long getLastTxId() {
                 return HdfsConstants.INVALID_TXID;
               }
             
          @@ -1103,9 +1112,9 @@ public class TestEditLog extends TestCase {
           
               for (EditLogInputStream edits : editStreams) {
                 FSEditLogLoader.EditLogValidation val = FSEditLogLoader.validateEditLog(edits);
          -      long read = val.getNumTransactions();
          +      long read = (val.getEndTxId() - edits.getFirstTxId()) + 1;
                 LOG.info("Loading edits " + edits + " read " + read);
          -      assertEquals(startTxId, val.getStartTxId());
          +      assertEquals(startTxId, edits.getFirstTxId());
                 startTxId += read;
                 totaltxnread += read;
               }
          @@ -1153,7 +1162,9 @@ public class TestEditLog extends TestCase {
                 fail("Should have thrown exception");
               } catch (IOException ioe) {
                 GenericTestUtils.assertExceptionContains(
          -          "No non-corrupt logs for txid " + startGapTxId, ioe);
          +          "Gap in transactions. Expected to be able to read up until " +
          +          "at least txid 40 but unable to find any edit logs containing " +
          +          "txid 11", ioe);
               }
             }
           
          @@ -1225,6 +1236,114 @@ public class TestEditLog extends TestCase {
                 byte[] garbage = new byte[r.nextInt(MAX_GARBAGE_LENGTH)];
                 r.nextBytes(garbage);
                 validateNoCrash(garbage);
          +
          +    }
          +  }
          +
          +  private static long readAllEdits(Collection<EditLogInputStream> streams,
          +      long startTxId) throws IOException {
          +    FSEditLogOp op;
          +    long nextTxId = startTxId;
          +    long numTx = 0;
          +    for (EditLogInputStream s : streams) {
          +      while (true) {
          +        op = s.readOp();
          +        if (op == null)
          +          break;
          +        if (op.getTransactionId() != nextTxId) {
          +          throw new IOException("out of order transaction ID!  expected " +
          +              nextTxId + " but got " + op.getTransactionId() + " when " +
          +              "reading " + s.getName());
          +        }
          +        numTx++;
          +        nextTxId = op.getTransactionId() + 1;
          +      }
          +    }
          +    return numTx;
          +  }
          +
          +  /**
          +   * Test edit log failover.  If a single edit log is missing, other 
          +   * edits logs should be used instead.
          +   */
          +  @Test
          +  public void testEditLogFailOverFromMissing() throws IOException {
          +    File f1 = new File(TEST_DIR + "/failover0");
          +    File f2 = new File(TEST_DIR + "/failover1");
          +    List<URI> editUris = ImmutableList.of(f1.toURI(), f2.toURI());
          +
          +    NNStorage storage = setupEdits(editUris, 3);
          +    
          +    final long startErrorTxId = 1*TXNS_PER_ROLL + 1;
          +    final long endErrorTxId = 2*TXNS_PER_ROLL;
          +
          +    File[] files = new File(f1, "current").listFiles(new FilenameFilter() {
          +        public boolean accept(File dir, String name) {
          +          if (name.startsWith(NNStorage.getFinalizedEditsFileName(startErrorTxId, 
          +                                  endErrorTxId))) {
          +            return true;
          +          }
          +          return false;
          +        }
          +      });
          +    assertEquals(1, files.length);
          +    assertTrue(files[0].delete());
          +
          +    FSEditLog editlog = getFSEditLog(storage);
          +    editlog.initJournalsForWrite();
          +    long startTxId = 1;
          +    try {
          +      readAllEdits(editlog.selectInputStreams(startTxId, 4*TXNS_PER_ROLL),
          +          startTxId);
          +    } catch (IOException e) {
          +      LOG.error("edit log failover didn't work", e);
          +      fail("Edit log failover didn't work");
          +    }
          +  }
          +
          +  /** 
          +   * Test edit log failover from a corrupt edit log
          +   */
          +  @Test
          +  public void testEditLogFailOverFromCorrupt() throws IOException {
          +    File f1 = new File(TEST_DIR + "/failover0");
          +    File f2 = new File(TEST_DIR + "/failover1");
          +    List<URI> editUris = ImmutableList.of(f1.toURI(), f2.toURI());
          +
          +    NNStorage storage = setupEdits(editUris, 3);
          +    
          +    final long startErrorTxId = 1*TXNS_PER_ROLL + 1;
          +    final long endErrorTxId = 2*TXNS_PER_ROLL;
          +
          +    File[] files = new File(f1, "current").listFiles(new FilenameFilter() {
          +        public boolean accept(File dir, String name) {
          +          if (name.startsWith(NNStorage.getFinalizedEditsFileName(startErrorTxId, 
          +                                  endErrorTxId))) {
          +            return true;
          +          }
          +          return false;
          +        }
          +      });
          +    assertEquals(1, files.length);
          +
          +    long fileLen = files[0].length();
          +    LOG.debug("Corrupting Log File: " + files[0] + " len: " + fileLen);
          +    RandomAccessFile rwf = new RandomAccessFile(files[0], "rw");
          +    rwf.seek(fileLen-4); // seek to checksum bytes
          +    int b = rwf.readInt();
          +    rwf.seek(fileLen-4);
          +    rwf.writeInt(b+1);
          +    rwf.close();
          +    
          +    FSEditLog editlog = getFSEditLog(storage);
          +    editlog.initJournalsForWrite();
          +    long startTxId = 1;
          +    try {
          +      readAllEdits(editlog.selectInputStreams(startTxId, 4*TXNS_PER_ROLL),
          +          startTxId);
          +    } catch (IOException e) {
          +      LOG.error("edit log failover didn't work", e);
          +      fail("Edit log failover didn't work");
               }
             }
           }
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          index ebcec96..44066ae 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          @@ -58,14 +58,15 @@ public class TestEditLogFileOutputStream {
               MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0)
                   .build();
           
          +    final long START_TXID = 1;
               StorageDirectory sd = cluster.getNameNode().getFSImage()
                 .getStorage().getStorageDir(0);
          -    File editLog = NNStorage.getInProgressEditsFile(sd, 1);
          +    File editLog = NNStorage.getInProgressEditsFile(sd, START_TXID);
           
               EditLogValidation validation = EditLogFileInputStream.validateEditLog(editLog);
               assertEquals("Edit log should contain a header as valid length",
                   HEADER_LEN, validation.getValidLength());
          -    assertEquals(1, validation.getNumTransactions());
          +    assertEquals(validation.getEndTxId(), START_TXID);
               assertEquals("Edit log should have 1MB pre-allocated, plus 4 bytes " +
                   "for the version number",
                   EditLogFileOutputStream.PREALLOCATION_LENGTH + 4, editLog.length());
          @@ -79,7 +80,7 @@ public class TestEditLogFileOutputStream {
               assertTrue("Edit log should have more valid data after writing a txn " +
                   "(was: " + oldLength + " now: " + validation.getValidLength() + ")",
                   validation.getValidLength() > oldLength);
          -    assertEquals(2, validation.getNumTransactions());
          +    assertEquals(1, validation.getEndTxId() - START_TXID);
           
               assertEquals("Edit log should be 1MB long, plus 4 bytes for the version number",
                   EditLogFileOutputStream.PREALLOCATION_LENGTH + 4, editLog.length());
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          index 1917dde..ba4b9a4 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          @@ -28,6 +28,7 @@ import java.io.IOException;
           import java.io.RandomAccessFile;
           import java.nio.channels.FileChannel;
           import java.util.Map;
          +import java.util.Set;
           import java.util.SortedMap;
           
           import org.apache.commons.logging.impl.Log4JLogger;
          @@ -38,16 +39,23 @@ import org.apache.hadoop.hdfs.DFSConfigKeys;
           import org.apache.hadoop.hdfs.DFSTestUtil;
           import org.apache.hadoop.hdfs.HdfsConfiguration;
           import org.apache.hadoop.hdfs.MiniDFSCluster;
          +import org.apache.hadoop.hdfs.protocol.HdfsConstants;
           import org.apache.hadoop.hdfs.server.common.Storage.StorageDirectory;
           import org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.EditLogValidation;
          +import org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.DeleteOp;
          +import org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.OpInstanceCache;
           import org.apache.hadoop.hdfs.server.namenode.NNStorage.NameNodeDirType;
           import org.apache.hadoop.io.IOUtils;
           import org.apache.log4j.Level;
           import org.junit.Test;
           
           import com.google.common.collect.Maps;
          +import com.google.common.collect.Sets;
           import com.google.common.io.Files;
           
          +import static org.mockito.Mockito.doNothing;
          +import static org.mockito.Mockito.spy;
          +
           public class TestFSEditLogLoader {
             
             static {
          @@ -66,8 +74,8 @@ public class TestFSEditLogLoader {
               Configuration conf = new HdfsConfiguration();
               MiniDFSCluster cluster = null;
               FileSystem fileSys = null;
          -    cluster = new MiniDFSCluster.Builder(conf).numDataNodes(NUM_DATA_NODES)
          -        .build();
          +    cluster = new MiniDFSCluster.Builder(conf).numDataNodes(NUM_DATA_NODES).
          +        enableManagedDfsDirsRedundancy(false).build();
               cluster.waitActive();
               fileSys = cluster.getFileSystem();
               final FSNamesystem namesystem = cluster.getNamesystem();
          @@ -152,108 +160,6 @@ public class TestFSEditLogLoader {
             }
             
             /**
          -   * Test that the valid number of transactions can be counted from a file.
          -   * @throws IOException 
          -   */
          -  @Test
          -  public void testCountValidTransactions() throws IOException {
          -    File testDir = new File(TEST_DIR, "testCountValidTransactions");
          -    File logFile = new File(testDir,
          -        NNStorage.getInProgressEditsFileName(1));
          -    
          -    // Create a log file, and return the offsets at which each
          -    // transaction starts.
          -    FSEditLog fsel = null;
          -    final int NUM_TXNS = 30;
          -    SortedMap<Long, Long> offsetToTxId = Maps.newTreeMap();
          -    try {
          -      fsel = FSImageTestUtil.createStandaloneEditLog(testDir);
          -      fsel.openForWrite();
          -      assertTrue("should exist: " + logFile, logFile.exists());
          -      
          -      for (int i = 0; i < NUM_TXNS; i++) {
          -        long trueOffset = getNonTrailerLength(logFile);
          -        long thisTxId = fsel.getLastWrittenTxId() + 1;
          -        offsetToTxId.put(trueOffset, thisTxId);
          -        System.err.println("txid " + thisTxId + " at offset " + trueOffset);
          -        fsel.logDelete("path" + i, i);
          -        fsel.logSync();
          -      }
          -    } finally {
          -      if (fsel != null) {
          -        fsel.close();
          -      }
          -    }
          -
          -    // The file got renamed when the log was closed.
          -    logFile = testDir.listFiles()[0];
          -    long validLength = getNonTrailerLength(logFile);
          -
          -    // Make sure that uncorrupted log has the expected length and number
          -    // of transactions.
          -    EditLogValidation validation = EditLogFileInputStream.validateEditLog(logFile);
          -    assertEquals(NUM_TXNS + 2, validation.getNumTransactions());
          -    assertEquals(validLength, validation.getValidLength());
          -    
          -    // Back up the uncorrupted log
          -    File logFileBak = new File(testDir, logFile.getName() + ".bak");
          -    Files.copy(logFile, logFileBak);
          -
          -    // Corrupt the log file in various ways for each txn
          -    for (Map.Entry<Long, Long> entry : offsetToTxId.entrySet()) {
          -      long txOffset = entry.getKey();
          -      long txid = entry.getValue();
          -      
          -      // Restore backup, truncate the file exactly before the txn
          -      Files.copy(logFileBak, logFile);
          -      truncateFile(logFile, txOffset);
          -      validation = EditLogFileInputStream.validateEditLog(logFile);
          -      assertEquals("Failed when truncating to length " + txOffset,
          -          txid - 1, validation.getNumTransactions());
          -      assertEquals(txOffset, validation.getValidLength());
          -
          -      // Restore backup, truncate the file with one byte in the txn,
          -      // also isn't valid
          -      Files.copy(logFileBak, logFile);
          -      truncateFile(logFile, txOffset + 1);
          -      validation = EditLogFileInputStream.validateEditLog(logFile);
          -      assertEquals("Failed when truncating to length " + (txOffset + 1),
          -          txid - 1, validation.getNumTransactions());
          -      assertEquals(txOffset, validation.getValidLength());
          -
          -      // Restore backup, corrupt the txn opcode
          -      Files.copy(logFileBak, logFile);
          -      corruptByteInFile(logFile, txOffset);
          -      validation = EditLogFileInputStream.validateEditLog(logFile);
          -      assertEquals("Failed when corrupting txn opcode at " + txOffset,
          -          txid - 1, validation.getNumTransactions());
          -      assertEquals(txOffset, validation.getValidLength());
          -
          -      // Restore backup, corrupt a byte a few bytes into the txn
          -      Files.copy(logFileBak, logFile);
          -      corruptByteInFile(logFile, txOffset+5);
          -      validation = EditLogFileInputStream.validateEditLog(logFile);
          -      assertEquals("Failed when corrupting txn data at " + (txOffset+5),
          -          txid - 1, validation.getNumTransactions());
          -      assertEquals(txOffset, validation.getValidLength());
          -    }
          -    
          -    // Corrupt the log at every offset to make sure that validation itself
          -    // never throws an exception, and that the calculated lengths are monotonically
          -    // increasing
          -    long prevNumValid = 0;
          -    for (long offset = 0; offset < validLength; offset++) {
          -      Files.copy(logFileBak, logFile);
          -      corruptByteInFile(logFile, offset);
          -      EditLogValidation val = EditLogFileInputStream.validateEditLog(logFile);
          -      assertTrue(String.format("%d should have been >= %d",
          -          val.getNumTransactions(), prevNumValid),
          -          val.getNumTransactions() >= prevNumValid);
          -      prevNumValid = val.getNumTransactions();
          -    }
          -  }
          -
          -  /**
              * Corrupt the byte at the given offset in the given file,
              * by subtracting 1 from it.
              */
          @@ -316,4 +222,116 @@ public class TestFSEditLogLoader {
                 fis.close();
               }
             }
          -}
          +
          +  static private File prepareUnfinalizedTestEditLog(File testDir, int numTx,
          +      SortedMap<Long, Long> offsetToTxId) throws IOException {
          +    File inProgressFile = new File(testDir, NNStorage.getInProgressEditsFileName(1));
          +    FSEditLog fsel = null, spyLog = null;
          +    try {
          +      fsel = FSImageTestUtil.createStandaloneEditLog(testDir);
          +      spyLog = spy(fsel);
          +      // Normally, the in-progress edit log would be finalized by
          +      // FSEditLog#endCurrentLogSegment.  For testing purposes, we
          +      // disable that here.
          +      doNothing().when(spyLog).endCurrentLogSegment(true);
          +      spyLog.openForWrite();
          +      assertTrue("should exist: " + inProgressFile, inProgressFile.exists());
          +      
          +      for (int i = 0; i < numTx; i++) {
          +        long trueOffset = getNonTrailerLength(inProgressFile);
          +        long thisTxId = spyLog.getLastWrittenTxId() + 1;
          +        offsetToTxId.put(trueOffset, thisTxId);
          +        System.err.println("txid " + thisTxId + " at offset " + trueOffset);
          +        spyLog.logDelete("path" + i, i);
          +        spyLog.logSync();
          +      }
          +    } finally {
          +      if (spyLog != null) {
          +        spyLog.close();
          +      } else if (fsel != null) {
          +        fsel.close();
          +      }
          +    }
          +    return inProgressFile;
          +  }
          +
          +  @Test
          +  public void testValidateEditLogWithCorruptHeader() throws IOException {
          +    File testDir = new File(TEST_DIR, "testValidateEditLogWithCorruptHeader");
          +    SortedMap<Long, Long> offsetToTxId = Maps.newTreeMap();
          +    File logFile = prepareUnfinalizedTestEditLog(testDir, 2, offsetToTxId);
          +    RandomAccessFile rwf = new RandomAccessFile(logFile, "rw");
          +    try {
          +      rwf.seek(0);
          +      rwf.writeLong(42); // corrupt header
          +    } finally {
          +      rwf.close();
          +    }
          +    EditLogValidation validation = EditLogFileInputStream.validateEditLog(logFile);
          +    assertTrue(validation.hasCorruptHeader());
          +  }
          +
          +  @Test
          +  public void testValidateEditLogWithCorruptBody() throws IOException {
          +    File testDir = new File(TEST_DIR, "testValidateEditLogWithCorruptBody");
          +    SortedMap<Long, Long> offsetToTxId = Maps.newTreeMap();
          +    final int NUM_TXNS = 20;
          +    File logFile = prepareUnfinalizedTestEditLog(testDir, NUM_TXNS,
          +        offsetToTxId);
          +    // Back up the uncorrupted log
          +    File logFileBak = new File(testDir, logFile.getName() + ".bak");
          +    Files.copy(logFile, logFileBak);
          +    EditLogValidation validation =
          +        EditLogFileInputStream.validateEditLog(logFile);
          +    assertTrue(!validation.hasCorruptHeader());
          +    // We expect that there will be an OP_START_LOG_SEGMENT, followed by
          +    // NUM_TXNS opcodes, followed by an OP_END_LOG_SEGMENT.
          +    assertEquals(NUM_TXNS + 1, validation.getEndTxId());
          +    // Corrupt each edit and verify that validation continues to work
          +    for (Map.Entry<Long, Long> entry : offsetToTxId.entrySet()) {
          +      long txOffset = entry.getKey();
          +      long txId = entry.getValue();
          +
          +      // Restore backup, corrupt the txn opcode
          +      Files.copy(logFileBak, logFile);
          +      FSImage.LOG.error("WATERMELON txId = " + txId + ", txOffset = " + txOffset);
          +      corruptByteInFile(logFile, txOffset);
          +      validation = EditLogFileInputStream.validateEditLog(logFile);
          +      long expectedEndTxId = (txId == (NUM_TXNS + 1)) ?
          +          NUM_TXNS : (NUM_TXNS + 1);
          +      assertEquals("Failed when corrupting txn opcode at " + txOffset,
          +          expectedEndTxId, validation.getEndTxId());
          +      assertTrue(!validation.hasCorruptHeader());
          +    }
          +
          +    // Truncate right before each edit and verify that validation continues
          +    // to work
          +    for (Map.Entry<Long, Long> entry : offsetToTxId.entrySet()) {
          +      long txOffset = entry.getKey();
          +      long txId = entry.getValue();
          +
          +      // Restore backup, corrupt the txn opcode
          +      Files.copy(logFileBak, logFile);
          +      truncateFile(logFile, txOffset);
          +      validation = EditLogFileInputStream.validateEditLog(logFile);
          +      long expectedEndTxId = (txId == 0) ?
          +          HdfsConstants.INVALID_TXID : (txId - 1);
          +      assertEquals("Failed when corrupting txid " + txId + " txn opcode " +
          +        "at " + txOffset, expectedEndTxId, validation.getEndTxId());
          +      assertTrue(!validation.hasCorruptHeader());
          +    }
          +  }
          +  
          +  @Test
          +  public void testValidateEmptyEditLog() throws IOException {
          +    File testDir = new File(TEST_DIR, "testValidateEmptyEditLog");
          +    SortedMap<Long, Long> offsetToTxId = Maps.newTreeMap();
          +    File logFile = prepareUnfinalizedTestEditLog(testDir, 0, offsetToTxId);
          +    // Truncate the file so that there is nothing except the header
          +    truncateFile(logFile, 4);
          +    EditLogValidation validation =
          +        EditLogFileInputStream.validateEditLog(logFile);
          +    assertTrue(!validation.hasCorruptHeader());
          +    assertEquals(HdfsConstants.INVALID_TXID, validation.getEndTxId());
          +  }
          +}
          \ No newline at end of file
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          index 0ac1944..c13ff55 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          @@ -44,6 +44,44 @@ import com.google.common.base.Joiner;
           
           public class TestFileJournalManager {
           
          +  /**
          +   * Find out how many transactions we can read from a
          +   * FileJournalManager, starting at a given transaction ID.
          +   * 
          +   * @param jm              The journal manager
          +   * @param fromTxId        Transaction ID to start at
          +   * @param inProgressOk    Should we consider edit logs that are not finalized?
          +   * @return                The number of transactions
          +   * @throws IOException
          +   */
          +  static long getNumberOfTransactions(FileJournalManager jm, long fromTxId,
          +      boolean inProgressOk) throws IOException {
          +    long txId = fromTxId;
          +    long numTransactions = 0;
          +    EditLogInputStream elis;
          +    while (true) {
          +      try {
          +        elis = jm.getInputStream(txId, inProgressOk);
          +      } catch (IOException e) {
          +        if (e.getMessage().startsWith("Cannot find editlog file containing ")) {
          +          break;
          +        } else {
          +          throw e;
          +        }
          +      }
          +      while (true) {
          +        FSEditLogOp op = elis.readOp();
          +        if (op == null) {
          +          break;
          +        }
          +        txId = op.getTransactionId();
          +        numTransactions++;
          +      }
          +      txId++;
          +    }
          +    return numTransactions;
          +  }
          +  
             /** 
              * Test the normal operation of loading transactions from
              * file journal manager. 3 edits directories are setup without any
          @@ -61,7 +99,7 @@ public class TestFileJournalManager {
               long numJournals = 0;
               for (StorageDirectory sd : storage.dirIterable(NameNodeDirType.EDITS)) {
                 FileJournalManager jm = new FileJournalManager(sd, storage);
          -      assertEquals(6*TXNS_PER_ROLL, jm.getNumberOfTransactions(1, true));
          +      assertEquals(6*TXNS_PER_ROLL, getNumberOfTransactions(jm, 1, true));
                 numJournals++;
               }
               assertEquals(3, numJournals);
          @@ -82,7 +120,7 @@ public class TestFileJournalManager {
           
               FileJournalManager jm = new FileJournalManager(sd, storage);
               assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, 
          -                 jm.getNumberOfTransactions(1, true));
          +                 getNumberOfTransactions(jm, 1, true));
             }
           
             /**
          @@ -104,16 +142,16 @@ public class TestFileJournalManager {
               Iterator<StorageDirectory> dirs = storage.dirIterator(NameNodeDirType.EDITS);
               StorageDirectory sd = dirs.next();
               FileJournalManager jm = new FileJournalManager(sd, storage);
          -    assertEquals(6*TXNS_PER_ROLL, jm.getNumberOfTransactions(1, true));
          +    assertEquals(6*TXNS_PER_ROLL, getNumberOfTransactions(jm, 1, true));
               
               sd = dirs.next();
               jm = new FileJournalManager(sd, storage);
          -    assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, jm.getNumberOfTransactions(1,
          +    assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, getNumberOfTransactions(jm, 1,
                   true));
           
               sd = dirs.next();
               jm = new FileJournalManager(sd, storage);
          -    assertEquals(6*TXNS_PER_ROLL, jm.getNumberOfTransactions(1, true));
          +    assertEquals(6*TXNS_PER_ROLL, getNumberOfTransactions(jm, 1, true));
             }
           
             /** 
          @@ -137,17 +175,17 @@ public class TestFileJournalManager {
               Iterator<StorageDirectory> dirs = storage.dirIterator(NameNodeDirType.EDITS);
               StorageDirectory sd = dirs.next();
               FileJournalManager jm = new FileJournalManager(sd, storage);
          -    assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, jm.getNumberOfTransactions(1,
          +    assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, getNumberOfTransactions(jm, 1,
                   true));
               
               sd = dirs.next();
               jm = new FileJournalManager(sd, storage);
          -    assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, jm.getNumberOfTransactions(1,
          +    assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, getNumberOfTransactions(jm, 1,
                   true));
           
               sd = dirs.next();
               jm = new FileJournalManager(sd, storage);
          -    assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, jm.getNumberOfTransactions(1,
          +    assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, getNumberOfTransactions(jm, 1,
                   true));
             }
           
          @@ -198,17 +236,17 @@ public class TestFileJournalManager {
           
               FileJournalManager jm = new FileJournalManager(sd, storage);
               long expectedTotalTxnCount = TXNS_PER_ROLL*10 + TXNS_PER_FAIL;
          -    assertEquals(expectedTotalTxnCount, jm.getNumberOfTransactions(1, true));
          +    assertEquals(expectedTotalTxnCount, getNumberOfTransactions(jm, 1, true));
           
               long skippedTxns = (3*TXNS_PER_ROLL); // skip first 3 files
               long startingTxId = skippedTxns + 1; 
           
          -    long numTransactionsToLoad = jm.getNumberOfTransactions(startingTxId, true);
          +    long numTransactionsToLoad = getNumberOfTransactions(jm, startingTxId, true);
               long numLoaded = 0;
               while (numLoaded < numTransactionsToLoad) {
                 EditLogInputStream editIn = jm.getInputStream(startingTxId, true);
                 FSEditLogLoader.EditLogValidation val = FSEditLogLoader.validateEditLog(editIn);
          -      long count = val.getNumTransactions();
          +      long count = (val.getEndTxId() - startingTxId) + 1;
           
                 editIn.close();
                 startingTxId += count;
          @@ -236,7 +274,7 @@ public class TestFileJournalManager {
               // 10 rolls, so 11 rolled files, 110 txids total.
               final int TOTAL_TXIDS = 10 * 11;
               for (int txid = 1; txid <= TOTAL_TXIDS; txid++) {
          -      assertEquals((TOTAL_TXIDS - txid) + 1, jm.getNumberOfTransactions(txid,
          +      assertEquals((TOTAL_TXIDS - txid) + 1, getNumberOfTransactions(jm, txid,
                     true));
               }
             }
          @@ -269,10 +307,10 @@ public class TestFileJournalManager {
               assertTrue(files[0].delete());
               
               FileJournalManager jm = new FileJournalManager(sd, storage);
          -    assertEquals(startGapTxId-1, jm.getNumberOfTransactions(1, true));
          +    assertEquals(startGapTxId-1, getNumberOfTransactions(jm, 1, true));
           
               try {
          -      jm.getNumberOfTransactions(startGapTxId, true);
          +      getNumberOfTransactions(jm, startGapTxId, true);
                 fail("Should have thrown an exception by now");
               } catch (IOException ioe) {
                 GenericTestUtils.assertExceptionContains(
          @@ -281,7 +319,7 @@ public class TestFileJournalManager {
           
               // rolled 10 times so there should be 11 files.
               assertEquals(11*TXNS_PER_ROLL - endGapTxId, 
          -                 jm.getNumberOfTransactions(endGapTxId + 1, true));
          +                 getNumberOfTransactions(jm, endGapTxId + 1, true));
             }
           
             /** 
          @@ -308,7 +346,7 @@ public class TestFileJournalManager {
           
               FileJournalManager jm = new FileJournalManager(sd, storage);
               assertEquals(10*TXNS_PER_ROLL+1, 
          -                 jm.getNumberOfTransactions(1, true));
          +                 getNumberOfTransactions(jm, 1, true));
             }
           
             @Test
          @@ -381,7 +419,7 @@ public class TestFileJournalManager {
               FileJournalManager jm = new FileJournalManager(sd, storage);
               
               // If we exclude the in-progess stream, we should only have 100 tx.
          -    assertEquals(100, jm.getNumberOfTransactions(1, false));
          +    assertEquals(100, getNumberOfTransactions(jm, 1, false));
               
               EditLogInputStream elis = jm.getInputStream(90, false);
               FSEditLogOp lastReadOp = null;
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java
          index 51e49a9..f677557 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java
          @@ -150,12 +150,6 @@ public class TestGenericJournalConf {
               }
           
               @Override
          -    public long getNumberOfTransactions(long fromTxnId, boolean inProgressOk)
          -        throws IOException {
          -      return 0;
          -    }
          -
          -    @Override
               public void setOutputBufferCapacity(int size) {}
           
               @Override
          diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          index 5a86fbf..48f8f95 100644
          --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          @@ -333,7 +333,7 @@ public class TestNameNodeRecovery {
             static void testNameNodeRecoveryImpl(Corruptor corruptor, boolean finalize)
                 throws IOException {
               final String TEST_PATH = "/test/path/dir";
          -    final int NUM_TEST_MKDIRS = 10;
          +    final String TEST_PATH2 = "/second/dir";
               final boolean needRecovery = corruptor.needRecovery(finalize);
           
               // start a cluster
          @@ -342,8 +342,8 @@ public class TestNameNodeRecovery {
               FileSystem fileSys = null;
               StorageDirectory sd = null;
               try {
          -      cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0)
          -          .build();
          +      cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0).
          +          enableManagedDfsDirsRedundancy(false).build();
                 cluster.waitActive();
                 if (!finalize) {
                   // Normally, the in-progress edit log would be finalized by
          @@ -357,9 +357,8 @@ public class TestNameNodeRecovery {
                 fileSys = cluster.getFileSystem();
                 final FSNamesystem namesystem = cluster.getNamesystem();
                 FSImage fsimage = namesystem.getFSImage();
          -      for (int i = 0; i < NUM_TEST_MKDIRS; i++) {
          -        fileSys.mkdirs(new Path(TEST_PATH));
          -      }
          +      fileSys.mkdirs(new Path(TEST_PATH));
          +      fileSys.mkdirs(new Path(TEST_PATH2));
                 sd = fsimage.getStorage().dirIterator(NameNodeDirType.EDITS).next();
               } finally {
                 if (cluster != null) {
          @@ -371,6 +370,7 @@ public class TestNameNodeRecovery {
               assertTrue("Should exist: " + editFile, editFile.exists());
           
               // Corrupt the edit log
          +    LOG.info("corrupting edit log file '" + editFile + "'");
               corruptor.corrupt(editFile);
           
               // If needRecovery == true, make sure that we can't start the
          @@ -378,8 +378,8 @@ public class TestNameNodeRecovery {
               cluster = null;
               try {
                 LOG.debug("trying to start normally (this should fail)...");
          -      cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0)
          -          .format(false).build();
          +      cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0).
          +          enableManagedDfsDirsRedundancy(false).format(false).build();
                 cluster.waitActive();
                 cluster.shutdown();
                 if (needRecovery) {
          @@ -403,8 +403,9 @@ public class TestNameNodeRecovery {
               cluster = null;
               try {
                 LOG.debug("running recovery...");
          -      cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0)
          -          .format(false).startupOption(recoverStartOpt).build();
          +      cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0).
          +          enableManagedDfsDirsRedundancy(false).format(false).
          +          startupOption(recoverStartOpt).build();
               } catch (IOException e) {
                 fail("caught IOException while trying to recover. " +
                     "message was " + e.getMessage() +
          @@ -420,7 +421,7 @@ public class TestNameNodeRecovery {
               try {
                 LOG.debug("starting cluster normally after recovery...");
                 cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0)
          -          .format(false).build();
          +          .enableManagedDfsDirsRedundancy(false).format(false).build();
                 LOG.debug("successfully recovered the " + corruptor.getName() +
                     " corrupted edit log");
                 assertTrue(cluster.getFileSystem().exists(new Path(TEST_PATH)));
          
          Show
          Colin Patrick McCabe added a comment - Let's see if this text box can take what I'm dishing out. As you know, the attachment system is still broken. diff --git hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java index 9d070d9..07dd179 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java +++ hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java @@ -79,12 +79,12 @@ class BookKeeperEditLogInputStream extends EditLogInputStream { } @Override - public long getFirstTxId() throws IOException { + public long getFirstTxId() { return firstTxId; } @Override - public long getLastTxId() throws IOException { + public long getLastTxId() { return lastTxId; } diff --git hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java index 047efd5..f00f0c2 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java +++ hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java @@ -332,7 +332,6 @@ public class BookKeeperJournalManager implements JournalManager { } // TODO(HA): Handle inProgressOk - @Override public long getNumberOfTransactions( long fromTxnId, boolean inProgressOk) throws IOException { long count = 0; diff --git hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java index 41f0292..a46f9cf 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java +++ hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java @@ -34,6 +34,6 @@ public class FSEditLogTestUtil { public static long countTransactionsInStream(EditLogInputStream in) throws IOException { FSEditLogLoader.EditLogValidation validation = FSEditLogLoader.validateEditLog(in); - return validation.getNumTransactions(); + return (validation.getEndTxId() - in.getFirstTxId()) + 1; } } diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java index a9aa20d..51e2728 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java @@ -207,7 +207,7 @@ public class BackupImage extends FSImage { int logVersion = storage.getLayoutVersion(); backupInputStream.setBytes(data, logVersion); - long numTxnsAdvanced = logLoader.loadEditRecords(logVersion, + long numTxnsAdvanced = logLoader.loadEditRecords( backupInputStream, true , lastAppliedTxId + 1, null ); if (numTxnsAdvanced != numTxns) { throw new IOException( "Batch of txns starting at txnid " + diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java index ebf4f48..97d93f8 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java @@ -60,14 +60,6 @@ class BackupJournalManager implements JournalManager { } @Override - public long getNumberOfTransactions( long fromTxnId, boolean inProgressOk) - throws IOException, CorruptionException { - // This JournalManager is never used for input. Therefore it cannot - // return any transactions - return 0; - } - - @Override public EditLogInputStream getInputStream( long fromTxnId, boolean inProgressOk) throws IOException { // This JournalManager is never used for input. Therefore it cannot diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java index 1f514cd..e8747ff 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java @@ -129,12 +129,12 @@ class EditLogBackupInputStream extends EditLogInputStream { } @Override - public long getFirstTxId() throws IOException { + public long getFirstTxId() { return HdfsConstants.INVALID_TXID; } @Override - public long getLastTxId() throws IOException { + public long getLastTxId() { return HdfsConstants.INVALID_TXID; } diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java index 29c90e9..53f1d72 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java @@ -90,12 +90,12 @@ public class EditLogFileInputStream extends EditLogInputStream { } @Override - public long getFirstTxId() throws IOException { + public long getFirstTxId() { return firstTxId; } @Override - public long getLastTxId() throws IOException { + public long getLastTxId() { return lastTxId; } @@ -186,7 +186,7 @@ public class EditLogFileInputStream extends EditLogInputStream { FSImage.LOG.warn( "Log at " + file + " has no valid header" , corrupt); return new FSEditLogLoader.EditLogValidation(0, - HdfsConstants.INVALID_TXID, HdfsConstants.INVALID_TXID, true ); + HdfsConstants.INVALID_TXID, true ); } try { diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java index c2b42be..9504d68 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java @@ -45,12 +45,12 @@ public abstract class EditLogInputStream implements Closeable { /** * @ return the first transaction which will be found in this stream */ - public abstract long getFirstTxId() throws IOException; + public abstract long getFirstTxId(); /** * @ return the last transaction which will be found in this stream */ - public abstract long getLastTxId() throws IOException; + public abstract long getLastTxId(); /** @@ -73,14 +73,14 @@ public abstract class EditLogInputStream implements Closeable { } return nextOp(); } - + /** * Position the stream so that a valid operation can be read from it with * readOp(). * * This method can be used to skip over corrupted sections of edit logs. */ - public void resync() throws IOException { + public void resync() { if (cachedOp != null ) { return ; } @@ -109,6 +109,8 @@ public abstract class EditLogInputStream implements Closeable { // error recovery will want to override this . try { return nextOp(); + } catch (RuntimeException e) { + return null ; } catch (IOException e) { return null ; } diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java index a48e5a6..a86cc79 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java @@ -90,7 +90,7 @@ public class FSEditLogLoader { fsNamesys.writeLock(); try { long startTime = now(); - long numEdits = loadEditRecords(logVersion, edits, false , + long numEdits = loadEditRecords(edits, false , expectedStartingTxId, recovery); FSImage.LOG.info( "Edits file " + edits.getName() + " of size " + edits.length() + " edits # " + numEdits @@ -102,7 +102,7 @@ public class FSEditLogLoader { } } - long loadEditRecords( int logVersion, EditLogInputStream in, boolean closeOnExit, + long loadEditRecords(EditLogInputStream in, boolean closeOnExit, long expectedStartingTxId, MetaRecoveryContext recovery) throws IOException { FSDirectory fsDir = fsNamesys.dir; @@ -141,7 +141,7 @@ public class FSEditLogLoader { } } catch (Throwable e) { // Handle a problem with our input - check203UpgradeFailure(logVersion, e); + check203UpgradeFailure(in.getVersion(), e); String errorMessage = formatEditLogReplayError(in, recentOpcodeOffsets, expectedTxId); FSImage.LOG.error(errorMessage, e); @@ -158,7 +158,7 @@ public class FSEditLogLoader { } recentOpcodeOffsets[( int )(numEdits % recentOpcodeOffsets.length)] = in.getPosition(); - if (LayoutVersion.supports(Feature.STORED_TXIDS, logVersion)) { + if (op.hasTransactionId()) { if (op.getTransactionId() > expectedTxId) { MetaRecoveryContext.editLogLoaderPrompt( "There appears " + "to be a gap in the edit log. We expected txid " + @@ -175,7 +175,7 @@ public class FSEditLogLoader { } } try { - applyEditLogOp(op, fsDir, logVersion); + applyEditLogOp(op, fsDir, in.getVersion()); } catch (Throwable e) { LOG.error( "Encountered exception on operation " + op, e); MetaRecoveryContext.editLogLoaderPrompt( "Failed to " + @@ -192,7 +192,7 @@ public class FSEditLogLoader { expectedTxId = lastAppliedTxId = expectedStartingTxId; } // log progress - if (LayoutVersion.supports(Feature.STORED_TXIDS, logVersion)) { + if (op.hasTransactionId()) { long now = now(); if (now - lastLogTime > REPLAY_TRANSACTION_LOG_INTERVAL) { int percent = Math .round(( float )lastAppliedTxId / numTxns * 100); @@ -647,76 +647,59 @@ public class FSEditLogLoader { } /** - * Return the number of valid transactions in the stream. If the stream is - * truncated during the header, returns a value indicating that there are - * 0 valid transactions. This reads through the stream but does not close - * it. + * Find the last valid transaction ID in the stream. + * If there are invalid or corrupt transactions in the middle of the stream, + * validateEditLog will skip over them. + * This reads through the stream but does not close it. + * * @ throws IOException if the stream cannot be read due to an IO error (eg * if the log does not exist) */ static EditLogValidation validateEditLog(EditLogInputStream in) { long lastPos = 0; - long firstTxId = HdfsConstants.INVALID_TXID; long lastTxId = HdfsConstants.INVALID_TXID; long numValid = 0; - try { - FSEditLogOp op = null ; - while ( true ) { - lastPos = in.getPosition(); + FSEditLogOp op = null ; + while ( true ) { + lastPos = in.getPosition(); + try { if ((op = in.readOp()) == null ) { break ; } - if (firstTxId == HdfsConstants.INVALID_TXID) { - firstTxId = op.getTransactionId(); - } - if (lastTxId == HdfsConstants.INVALID_TXID - || op.getTransactionId() == lastTxId + 1) { - lastTxId = op.getTransactionId(); - } else { - FSImage.LOG.error( "Out of order txid found. Found " + - op.getTransactionId() + ", expected " + (lastTxId + 1)); - break ; - } - numValid++; + } catch (Throwable t) { + FSImage.LOG.warn( "Caught exception after reading " + numValid + + " ops from " + in + " while determining its valid length." + + "Position was " + lastPos, t); + in.resync(); + FSImage.LOG.warn( "After resync, position is " + in.getPosition()); + continue ; + } + if (lastTxId == HdfsConstants.INVALID_TXID + || op.getTransactionId() > lastTxId) { + lastTxId = op.getTransactionId(); } - } catch (Throwable t) { - // Catch Throwable and not just IOE, since bad edits may generate - // NumberFormatExceptions, AssertionErrors, OutOfMemoryErrors, etc. - FSImage.LOG.debug( "Caught exception after reading " + numValid + - " ops from " + in + " while determining its valid length." , t); + numValid++; } - return new EditLogValidation(lastPos, firstTxId, lastTxId, false ); + return new EditLogValidation(lastPos, lastTxId, false ); } - + static class EditLogValidation { private final long validLength; - private final long startTxId; private final long endTxId; - private final boolean corruptionDetected; - - EditLogValidation( long validLength, long startTxId, long endTxId, - boolean corruptionDetected) { + private final boolean hasCorruptHeader; + + EditLogValidation( long validLength, long endTxId, + boolean hasCorruptHeader) { this .validLength = validLength; - this .startTxId = startTxId; this .endTxId = endTxId; - this .corruptionDetected = corruptionDetected; + this .hasCorruptHeader = hasCorruptHeader; } - + long getValidLength() { return validLength; } - - long getStartTxId() { return startTxId; } - + long getEndTxId() { return endTxId; } - - long getNumTransactions() { - if (endTxId == HdfsConstants.INVALID_TXID - || startTxId == HdfsConstants.INVALID_TXID) { - return 0; - } - return (endTxId - startTxId) + 1; - } - - boolean hasCorruptHeader() { return corruptionDetected; } + + boolean hasCorruptHeader() { return hasCorruptHeader; } } /** diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java index c5d4195..261e71a 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java @@ -561,7 +561,7 @@ public class FSImage implements Closeable { /** * Choose latest image from one of the directories, - * load it and merge with the edits from that directory. + * load it and merge with the edits. * * Saving and loading fsimage should never trigger symlink resolution. * The paths that are persisted do not have *intermediate* symlinks diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java index 3767111..c7b9f5c 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java @@ -232,10 +232,10 @@ class FileJournalManager implements JournalManager { LOG.info( String .format( "Log begins at txid %d, but requested start " + "txid is %d. Skipping %d edits." , elf.getFirstTxId(), fromTxId, transactionsToSkip)); - } - if (elfis.skipUntil(fromTxId) == false ) { - throw new IOException( "failed to advance input stream to txid " + - fromTxId); + if (elfis.skipUntil(fromTxId) == false ) { + throw new IOException( "failed to advance input stream to txid " + + fromTxId); + } } return elfis; } @@ -245,60 +245,6 @@ class FileJournalManager implements JournalManager { } @Override - public long getNumberOfTransactions( long fromTxId, boolean inProgressOk) - throws IOException, CorruptionException { - long numTxns = 0L; - - for (EditLogFile elf : getLogFiles(fromTxId)) { - if (LOG.isTraceEnabled()) { - LOG.trace( "Counting " + elf); - } - if (elf.getFirstTxId() > fromTxId) { // there must be a gap - LOG.warn( "Gap in transactions in " + sd.getRoot() + ". Gap is " - + fromTxId + " - " + (elf.getFirstTxId() - 1)); - break ; - } else if (elf.containsTxId(fromTxId)) { - if (!inProgressOk && elf.isInProgress()) { - break ; - } - - if (elf.isInProgress()) { - elf.validateLog(); - } - - if (elf.hasCorruptHeader()) { - break ; - } - numTxns += elf.getLastTxId() + 1 - fromTxId; - fromTxId = elf.getLastTxId() + 1; - - if (elf.isInProgress()) { - break ; - } - } - } - - if (LOG.isDebugEnabled()) { - LOG.debug( "Journal " + this + " has " + numTxns - + " txns from " + fromTxId); - } - - long max = findMaxTransaction(inProgressOk); - - // fromTxId should be greater than max, as it points to the next - // transaction we should expect to find. If it is less than or equal - // to max, it means that a transaction with txid == max has not been found - if (numTxns == 0 && fromTxId <= max) { - String error = String .format( "Gap in transactions, max txnid is %d" - + ", 0 txns from %d" , max, fromTxId); - LOG.error(error); - throw new CorruptionException(error); - } - - return numTxns; - } - - @Override synchronized public void recoverUnfinalizedSegments() throws IOException { File currentDir = sd.getCurrentDir(); LOG.info( "Recovering unfinalized segments in " + currentDir); @@ -318,7 +264,7 @@ class FileJournalManager implements JournalManager { } continue ; } - + elf.validateLog(); if (elf.hasCorruptHeader()) { @@ -326,19 +272,16 @@ class FileJournalManager implements JournalManager { throw new CorruptionException( "In-progress edit log file is corrupt: " + elf); } - - // If the file has a valid header (isn't corrupt) but contains no - // transactions, we likely just crashed after opening the file and - // writing the header, but before syncing any transactions. Safe to - // delete the file. - if (elf.getNumTransactions() == 0) { - LOG.info( "Deleting edit log file with zero transactions " + elf); - if (!elf.getFile().delete()) { - throw new IOException( "Unable to delete " + elf.getFile()); - } + if (elf.getLastTxId() == HdfsConstants.INVALID_TXID) { + // If the file has a valid header (isn't corrupt) but contains no + // transactions, we likely just crashed after opening the file and + // writing the header, but before syncing any transactions. Safe to + // delete the file. + LOG.error( "Moving aside edit log file that seems to have zero " + + "transactions " + elf); + elf.moveAsideEmptyFile(); continue ; } - finalizeLogSegment(elf.getFirstTxId(), elf.getLastTxId()); } } @@ -361,39 +304,6 @@ class FileJournalManager implements JournalManager { return logFiles; } - /** - * Find the maximum transaction in the journal. - */ - private long findMaxTransaction( boolean inProgressOk) - throws IOException { - boolean considerSeenTxId = true ; - long seenTxId = NNStorage.readTransactionIdFile(sd); - long maxSeenTransaction = 0; - for (EditLogFile elf : getLogFiles(0)) { - if (elf.isInProgress() && !inProgressOk) { - if (elf.getFirstTxId() != HdfsConstants.INVALID_TXID && - elf.getFirstTxId() <= seenTxId) { - // don't look at the seen_txid file if in-progress logs are not to be - // examined, and the value in seen_txid falls within the in-progress - // segment. - considerSeenTxId = false ; - } - continue ; - } - - if (elf.isInProgress()) { - maxSeenTransaction = Math .max(elf.getFirstTxId(), maxSeenTransaction); - elf.validateLog(); - } - maxSeenTransaction = Math .max(elf.getLastTxId(), maxSeenTransaction); - } - if (considerSeenTxId) { - return Math .max(maxSeenTransaction, seenTxId); - } else { - return maxSeenTransaction; - } - } - @Override public String toString() { return String .format( "FileJournalManager(root=%s)" , sd.getRoot()); @@ -406,7 +316,6 @@ class FileJournalManager implements JournalManager { private File file; private final long firstTxId; private long lastTxId; - private long numTx = -1; private boolean hasCorruptHeader = false ; private final boolean isInProgress; @@ -454,20 +363,15 @@ class FileJournalManager implements JournalManager { } /** - * Count the number of valid transactions in a log. + * Find out where the edit log ends. * This will update the lastTxId of the EditLogFile or * mark it as corrupt if it is. */ void validateLog() throws IOException { EditLogValidation val = EditLogFileInputStream.validateEditLog(file); - this .numTx = val.getNumTransactions(); this .lastTxId = val.getEndTxId(); this .hasCorruptHeader = val.hasCorruptHeader(); } - - long getNumTransactions() { - return numTx; - } boolean isInProgress() { return isInProgress; @@ -483,23 +387,31 @@ class FileJournalManager implements JournalManager { void moveAsideCorruptFile() throws IOException { assert hasCorruptHeader; - + renameSelf( ".corrupt" ); + } + + void moveAsideEmptyFile() throws IOException { + assert lastTxId == HdfsConstants.INVALID_TXID; + renameSelf( ".empty" ); + } + + private void renameSelf( String newSuffix) throws IOException { File src = file; - File dst = new File(src.getParent(), src.getName() + ".corrupt" ); + File dst = new File(src.getParent(), src.getName() + newSuffix); boolean success = src.renameTo(dst); if (!success) { throw new IOException( - "Couldn't rename corrupt log " + src + " to " + dst); + "Couldn't rename log " + src + " to " + dst); } file = dst; } - + @Override public String toString() { return String .format( "EditLogFile(file=%s,first=%019d,last=%019d," - + "inProgress=%b,hasCorruptHeader=%b,numTx=%d)" , + + "inProgress=%b,hasCorruptHeader=%b)" , file.toString(), firstTxId, lastTxId, - isInProgress(), hasCorruptHeader, numTx); + isInProgress(), hasCorruptHeader); } } } diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java index f9c622d..390c38c 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java @@ -56,18 +56,6 @@ public interface JournalManager extends Closeable { throws IOException; /** - * Get the number of transaction contiguously available from fromTxnId. - * - * @param fromTxnId Transaction id to count from - * @param inProgressOk whether or not in-progress streams should be counted - * @ return The number of transactions available from fromTxnId - * @ throws IOException if the journal cannot be read. - * @ throws CorruptionException if there is a gap in the journal at fromTxnId. - */ - long getNumberOfTransactions( long fromTxnId, boolean inProgressOk) - throws IOException, CorruptionException; - - /** * Set the amount of memory that this stream should use to buffer edits */ void setOutputBufferCapacity( int size); diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java index d84d79d..391304d 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java @@ -20,6 +20,8 @@ package org.apache.hadoop.hdfs.server.namenode; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; + import java.util.Comparator; + import java.util.LinkedList; import java.util.List; import java.util.SortedSet; @@ -28,6 +30,7 @@ import org.apache.commons.logging.LogFactory; import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.hdfs.server.protocol.RemoteEditLog; import org.apache.hadoop.hdfs.server.protocol.RemoteEditLogManifest; + import org.apache.hadoop.io.MultipleIOException; import com.google.common.annotations.VisibleForTesting; import com.google.common.base.Preconditions; @@ -193,76 +196,67 @@ public class JournalSet implements JournalManager { } }, "close journal" ); } - /** - * Find the best editlog input stream to read from txid. - * If a journal throws an CorruptionException while reading from a txn id, - * it means that it has more transactions, but can't find any from fromTxId. - * If this is the case and no other journal has transactions, we should throw - * an exception as it means more transactions exist, we just can't load them. + * Get an input stream that can supply transactions starting at txid. + * If there are multiple edit logs present, we will merge them into a single + * RedundantEditLogInputStream. + * + * If the only edit logs available starting at txid are corrupt, we'll throw an + * IOException. This means that more transactions exist, but we can't load + * them. * * @param fromTxnId Transaction id to start from. - * @ return A edit log input stream with tranactions fromTxId - * or null if no more exist + * @ return An edit log input stream with tranactions fromTxId, or null if + * there are no files which cover this range. */ @Override public EditLogInputStream getInputStream( long fromTxnId, boolean inProgressOk) throws IOException { - JournalManager bestjm = null ; - long bestjmNumTxns = 0; - CorruptionException corruption = null ; + LinkedList<EditLogInputStream> streams = new LinkedList<EditLogInputStream>(); + LinkedList<IOException> corruptions = new LinkedList<IOException>(); for (JournalAndStream jas : journals) { if (jas.isDisabled()) continue ; - + JournalManager candidate = jas.getManager(); long candidateNumTxns = 0; + EditLogInputStream elis; try { - candidateNumTxns = candidate.getNumberOfTransactions(fromTxnId, - inProgressOk); + elis = candidate.getInputStream(fromTxnId, inProgressOk); + if (elis == null ) { + LOG.warn( "No input stream found for JournalManager " + candidate + + ", txid " + fromTxnId); + continue ; + } } catch (CorruptionException ce) { - corruption = ce; + LOG.warn( "JournalManager " + candidate + " encountered a " + + "CorruptionException while looking for txid " + fromTxnId, ce); + corruptions.add(ce); + continue ; } catch (IOException ioe) { - LOG.warn( "Unable to read input streams from JournalManager " + candidate, - ioe); - continue ; // error reading disk, just skip - } - - if (candidateNumTxns > bestjmNumTxns) { - bestjm = candidate; - bestjmNumTxns = candidateNumTxns; + LOG.warn( "Unable to read input stream from JournalManager " + + candidate + ", txid " + fromTxnId, ioe); + continue ; } + + LOG.info( "Examined edit log '" + elis.getName() + "'; it starts at " + + elis.getFirstTxId() + " and ends at " + elis.getLastTxId() + "." ); + streams.add(elis); } - - if (bestjm == null ) { - if (corruption != null ) { - throw new IOException( "No non-corrupt logs for txid " - + fromTxnId, corruption); - } else { + + if (streams.isEmpty()) { + if (corruptions.isEmpty()) { return null ; - } - } - return bestjm.getInputStream(fromTxnId, inProgressOk); - } - - @Override - public long getNumberOfTransactions( long fromTxnId, boolean inProgressOk) - throws IOException { - long num = 0; - for (JournalAndStream jas: journals) { - if (jas.isDisabled()) { - LOG.info( "Skipping jas " + jas + " since it's disabled" ); - continue ; } else { - long newNum = jas.getManager().getNumberOfTransactions(fromTxnId, - inProgressOk); - if (newNum > num) { - num = newNum; - } + throw new IOException( "No non-corrupt logs for txid " + fromTxnId, + MultipleIOException.createIOException(corruptions)); } + } else if (streams.size() == 1) { + return streams.get(0); + } else { + return new RedundantEditLogInputStream(streams); } - return num; } /** diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java new file mode 100644 index 0000000..63ba506 --- /dev/ null +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java @@ -0,0 +1,223 @@ + package org.apache.hadoop.hdfs.server.namenode; + + import java.io.IOException; + import java.util.Arrays; + import java.util.Collections; + import java.util.Comparator; + import java.util.LinkedList; + import java.util.List; + + import org.apache.commons.lang.StringUtils; + import org.apache.commons.logging.Log; + import org.apache.commons.logging.LogFactory; + import org.apache.hadoop.hdfs.protocol.HdfsConstants; + + import com.google.common.base.Preconditions; + import com.google.common.primitives.Longs; + +/** + * A merged input stream that handles failover between different edit logs. + */ + public class RedundantEditLogInputStream extends EditLogInputStream { + public static final Log LOG = LogFactory.getLog(EditLogInputStream.class.getName()); + private int curIdx; + private long prevTxId; + private final EditLogInputStream[] streams; + /** + * States that the RedundantEditLogInputStream can be in. + */ + static private enum State { + /** We need to skip until prevTxId + 1 */ + SKIP_UNTIL, + /** We're ready to read opcodes out of the current stream */ + OK, + /** The current stream has failed. */ + STREAM_FAILED, + /** The current stream has failed, and resync() was called. */ + STREAM_FAILED_RESYNC, + /** There are no more opcodes to read from this + * RedundantEditLogInputStream */ + EOF; + } + private State state; + private IOException prevException; + + RedundantEditLogInputStream(List<EditLogInputStream> streams) throws IOException { + this .curIdx = 0; + this .prevTxId = HdfsConstants.INVALID_TXID; + + if (streams.isEmpty()) { + this .state = State.EOF; + } else { + this .state = State.OK; + } + prevException = null ; + // EditLogInputStreams in a RedundantEditLogInputStream must be finalized, + // and can't be pre-transactional. + for (EditLogInputStream s : streams) { + Preconditions.checkArgument(s.getFirstTxId() != + HdfsConstants.INVALID_TXID); + Preconditions.checkArgument(s.getLastTxId() != + HdfsConstants.INVALID_TXID); + Preconditions.checkArgument(!s.isInProgress()); + } + + this .streams = streams.toArray( new EditLogInputStream[0]); + + /* We sort the streams here so that the streams that end later come first. + */ + Arrays.sort( this .streams, new Comparator<EditLogInputStream>() { + @Override + public int compare(EditLogInputStream a, EditLogInputStream b) { + return Longs.compare(b.getLastTxId(), a.getLastTxId()); + } + }); + } + + @Override + public String getName() { + StringBuilder bld = new StringBuilder(); + String prefix = ""; + for (EditLogInputStream s : streams) { + bld.append(prefix); + bld.append(s.getName()); + prefix = ", " ; + } + return bld.toString(); + } + + @Override + public long getFirstTxId() { + return streams[curIdx].getFirstTxId(); + } + + @Override + public long getLastTxId() { + return streams[curIdx].getLastTxId(); + } + + @Override + public void close() throws IOException { + LinkedList<Throwable> exceptions = new LinkedList<Throwable>(); + for (EditLogInputStream s : streams) { + try { + s.close(); + } catch (Throwable t) { + LOG.error(t); + exceptions.add(t); + } + } + if (!exceptions.isEmpty()) { + throw new IOException( "errors while closing " + + "EditLogInputStreams: " + + StringUtils.join(exceptions.toArray(), ',')); + } + } + + @Override + protected FSEditLogOp nextValidOp() { + try { + if (state == State.STREAM_FAILED) { + state = State.STREAM_FAILED_RESYNC; + } + return nextOp(); + } catch (IOException e) { + return null ; + } + } + + @Override + protected FSEditLogOp nextOp() throws IOException { + while ( true ) { + switch (state) { + case SKIP_UNTIL: + try { + if (prevTxId != HdfsConstants.INVALID_TXID) { + LOG.info( "Fast-forwarding stream '" + streams[curIdx].getName() + + "' to transaction ID " + (prevTxId + 1)); + streams[curIdx].skipUntil(prevTxId + 1); + } + } catch (IOException e) { + prevException = e; + state = State.STREAM_FAILED; + } + state = State.OK; + break ; + case OK: + try { + FSEditLogOp op = streams[curIdx].readOp(); + if (op == null ) { + state = State.EOF; + if (streams[curIdx].getLastTxId() == prevTxId) { + return null ; + } else { + throw new IOException( "got premature end-of-file at txid " + + prevTxId + "; expected file to go up to " + + streams[curIdx].getLastTxId()); + } + } + prevTxId = op.getTransactionId(); + return op; + } catch (IOException e) { + prevException = e; + state = State.STREAM_FAILED; + } + break ; + case STREAM_FAILED: + if (curIdx + 1 == streams.length) { + throw prevException; + } + long oldLast = streams[curIdx].getLastTxId(); + long newLast = streams[curIdx + 1].getLastTxId(); + if (newLast < oldLast) { + throw new IOException( "We encountered an error reading " + + streams[curIdx].getName() + ". During automatic failover, " + + "we noticed that all of the remaining edit log streams are " + + "shorter than the current one! The best " + + "remaining edit log ends at transaction " + + newLast + ", but we thought we could read up to transaction " + + oldLast + ". If you continue , metadata will be lost forever!" ); + } + LOG.error( "Got error reading edit log input stream " + + streams[curIdx].getName(), prevException); + LOG.error( "failing over to edit log " + + streams[curIdx + 1].getName()); + curIdx++; + state = State.SKIP_UNTIL; + break ; + case STREAM_FAILED_RESYNC: + if (curIdx + 1 == streams.length) { + streams[curIdx].resync(); + } else { + LOG.error( "failing over to edit log " + + streams[curIdx + 1].getName()); + curIdx++; + } + state = State.SKIP_UNTIL; + break ; + case EOF: + return null ; + } + } + } + + @Override + public int getVersion() throws IOException { + return streams[curIdx].getVersion(); + } + + @Override + public long getPosition() { + return streams[curIdx].getPosition(); + } + + @Override + public long length() throws IOException { + return streams[curIdx].length(); + } + + @Override + public boolean isInProgress() { + return false ; + } +} diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java index b418fcf..c396d82 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java @@ -131,6 +131,7 @@ public class MiniDFSCluster { private int numDataNodes = 1; private boolean format = true ; private boolean manageNameDfsDirs = true ; + private boolean enableManagedDfsDirsRedundancy = true ; private boolean manageDataDfsDirs = true ; private StartupOption option = null ; private String [] racks = null ; @@ -184,6 +185,14 @@ public class MiniDFSCluster { this .manageNameDfsDirs = val; return this ; } + + /** + * Default: true + */ + public Builder enableManagedDfsDirsRedundancy( boolean val) { + this .enableManagedDfsDirsRedundancy = val; + return this ; + } /** * Default: true @@ -286,6 +295,7 @@ public class MiniDFSCluster { builder.numDataNodes, builder.format, builder.manageNameDfsDirs, + builder.enableManagedDfsDirsRedundancy, builder.manageDataDfsDirs, builder.option, builder.racks, @@ -373,7 +383,7 @@ public class MiniDFSCluster { public MiniDFSCluster(Configuration conf, int numDataNodes, StartupOption nameNodeOperation) throws IOException { - this (0, conf, numDataNodes, false , false , false , nameNodeOperation, + this (0, conf, numDataNodes, false , false , false , false , nameNodeOperation, null , null , null ); } @@ -395,7 +405,8 @@ public class MiniDFSCluster { int numDataNodes, boolean format, String [] racks) throws IOException { - this (0, conf, numDataNodes, format, true , true , null , racks, null , null ); + this (0, conf, numDataNodes, format, true , true , true , null , + racks, null , null ); } /** @@ -417,7 +428,8 @@ public class MiniDFSCluster { int numDataNodes, boolean format, String [] racks, String [] hosts) throws IOException { - this (0, conf, numDataNodes, format, true , true , null , racks, hosts, null ); + this (0, conf, numDataNodes, format, true , true , true , null , + racks, hosts, null ); } /** @@ -450,8 +462,8 @@ public class MiniDFSCluster { boolean manageDfsDirs, StartupOption operation, String [] racks) throws IOException { - this (nameNodePort, conf, numDataNodes, format, manageDfsDirs, manageDfsDirs, - operation, racks, null , null ); + this (nameNodePort, conf, numDataNodes, format, manageDfsDirs, + manageDfsDirs, manageDfsDirs, operation, racks, null , null ); } /** @@ -485,7 +497,7 @@ public class MiniDFSCluster { String [] racks, long [] simulatedCapacities) throws IOException { this (nameNodePort, conf, numDataNodes, format, manageDfsDirs, manageDfsDirs, - operation, racks, null , simulatedCapacities); + manageDfsDirs, operation, racks, null , simulatedCapacities); } /** @@ -519,13 +531,15 @@ public class MiniDFSCluster { int numDataNodes, boolean format, boolean manageNameDfsDirs, + boolean enableManagedDfsDirsRedundancy, boolean manageDataDfsDirs, StartupOption operation, String [] racks, String hosts[], long [] simulatedCapacities) throws IOException { this .nameNodes = new NameNodeInfo[1]; // Single namenode in the cluster initMiniDFSCluster(conf, numDataNodes, format, - manageNameDfsDirs, manageDataDfsDirs, operation, racks, hosts, + manageNameDfsDirs, enableManagedDfsDirsRedundancy, manageDataDfsDirs, + operation, racks, hosts, simulatedCapacities, null , true , false , MiniDFSNNTopology.simpleSingleNN(nameNodePort, 0)); } @@ -533,6 +547,7 @@ public class MiniDFSCluster { private void initMiniDFSCluster( Configuration conf, int numDataNodes, boolean format, boolean manageNameDfsDirs, + boolean enableManagedDfsDirsRedundancy, boolean manageDataDfsDirs, StartupOption operation, String [] racks, String [] hosts, long [] simulatedCapacities, String clusterId, boolean waitSafeMode, boolean setupHostsFile, @@ -572,7 +587,8 @@ public class MiniDFSCluster { federation = nnTopology.isFederated(); createNameNodesAndSetConf( - nnTopology, manageNameDfsDirs, format, operation, clusterId, conf); + nnTopology, manageNameDfsDirs, enableManagedDfsDirsRedundancy, + format, operation, clusterId, conf); if (format) { if (data_dir.exists() && !FileUtil.fullyDelete(data_dir)) { @@ -593,7 +609,8 @@ public class MiniDFSCluster { } private void createNameNodesAndSetConf(MiniDFSNNTopology nnTopology, - boolean manageNameDfsDirs, boolean format, StartupOption operation, + boolean manageNameDfsDirs, boolean enableManagedDfsDirsRedundancy, + boolean format, StartupOption operation, String clusterId, Configuration conf) throws IOException { Preconditions.checkArgument(nnTopology.countNameNodes() > 0, @@ -650,7 +667,7 @@ public class MiniDFSCluster { Collection<URI> prevNNDirs = null ; int nnCounterForFormat = nnCounter; for (NNConf nn : nameservice.getNNs()) { - initNameNodeConf(conf, nsId, nn.getNnId(), manageNameDfsDirs, + initNameNodeConf(conf, nsId, nn.getNnId(), manageNameDfsDirs, manageNameDfsDirs, nnCounterForFormat); Collection<URI> namespaceDirs = FSNamesystem.getNamespaceDirs(conf); if (format) { @@ -682,7 +699,8 @@ public class MiniDFSCluster { // Start all Namenodes for (NNConf nn : nameservice.getNNs()) { - initNameNodeConf(conf, nsId, nn.getNnId(), manageNameDfsDirs, nnCounter); + initNameNodeConf(conf, nsId, nn.getNnId(), manageNameDfsDirs, + enableManagedDfsDirsRedundancy, nnCounter); createNameNode(nnCounter++, conf, numDataNodes, false , operation, clusterId, nsId, nn.getNnId()); } @@ -707,8 +725,8 @@ public class MiniDFSCluster { private void initNameNodeConf(Configuration conf, String nameserviceId, String nnId, - boolean manageNameDfsDirs, int nnIndex) - throws IOException { + boolean manageNameDfsDirs, boolean enableManagedDfsDirsRedundancy, + int nnIndex) throws IOException { if (nameserviceId != null ) { conf.set(DFS_FEDERATION_NAMESERVICE_ID, nameserviceId); } @@ -717,12 +735,21 @@ public class MiniDFSCluster { } if (manageNameDfsDirs) { - conf.set(DFS_NAMENODE_NAME_DIR_KEY, - fileAsURI( new File(base_dir, "name" + (2*nnIndex + 1)))+ "," + - fileAsURI( new File(base_dir, "name" + (2*nnIndex + 2)))); - conf.set(DFS_NAMENODE_CHECKPOINT_DIR_KEY, - fileAsURI( new File(base_dir, "namesecondary" + (2*nnIndex + 1)))+ "," + - fileAsURI( new File(base_dir, "namesecondary" + (2*nnIndex + 2)))); + if (enableManagedDfsDirsRedundancy) { + conf.set(DFS_NAMENODE_NAME_DIR_KEY, + fileAsURI( new File(base_dir, "name" + (2*nnIndex + 1)))+ "," + + fileAsURI( new File(base_dir, "name" + (2*nnIndex + 2)))); + conf.set(DFS_NAMENODE_CHECKPOINT_DIR_KEY, + fileAsURI( new File(base_dir, "namesecondary" + (2*nnIndex + 1)))+ "," + + fileAsURI( new File(base_dir, "namesecondary" + (2*nnIndex + 2)))); + } else { + conf.set(DFS_NAMENODE_NAME_DIR_KEY, + fileAsURI( new File(base_dir, "name" + (2*nnIndex + 1))). + toString()); + conf.set(DFS_NAMENODE_CHECKPOINT_DIR_KEY, + fileAsURI( new File(base_dir, "namesecondary" + (2*nnIndex + 1))). + toString()); + } } } @@ -2118,7 +2145,7 @@ public class MiniDFSCluster { String nnId = null ; initNameNodeAddress(conf, nameserviceId, new NNConf(nnId).setIpcPort(namenodePort)); - initNameNodeConf(conf, nameserviceId, nnId, true , nnIndex); + initNameNodeConf(conf, nameserviceId, nnId, true , true , nnIndex); createNameNode(nnIndex, conf, numDataNodes, true , null , null , nameserviceId, nnId); diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java index cae3b0d..d0466f5 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java @@ -22,6 +22,7 @@ import java.io.*; import java.net.URI; import java.util.Collection; import java.util.Iterator; + import java.util.LinkedList; import java.util.List; import java.util.ArrayList; import java.util.Collections; @@ -505,21 +506,28 @@ public class TestEditLog extends TestCase { FSImage fsimage = namesystem.getFSImage(); final FSEditLog editLog = fsimage.getEditLog(); fileSys.mkdirs( new Path( "/tmp" )); - StorageDirectory sd = fsimage.getStorage().dirIterator(NameNodeDirType.EDITS).next(); + Iterator<StorageDirectory> iter = fsimage.getStorage(). + dirIterator(NameNodeDirType.EDITS); + LinkedList<StorageDirectory> sds = new LinkedList<StorageDirectory>(); + while (iter.hasNext()) { + sds.add(iter.next()); + } editLog.close(); cluster.shutdown(); - File editFile = NNStorage.getFinalizedEditsFile(sd, 1, 3); - assertTrue(editFile.exists()); - - long fileLen = editFile.length(); - System .out.println( "File name: " + editFile + " len: " + fileLen); - RandomAccessFile rwf = new RandomAccessFile(editFile, "rw" ); - rwf.seek(fileLen-4); // seek to checksum bytes - int b = rwf.readInt(); - rwf.seek(fileLen-4); - rwf.writeInt(b+1); - rwf.close(); + for (StorageDirectory sd : sds) { + File editFile = NNStorage.getFinalizedEditsFile(sd, 1, 3); + assertTrue(editFile.exists()); + + long fileLen = editFile.length(); + LOG.debug( "Corrupting Log File: " + editFile + " len: " + fileLen); + RandomAccessFile rwf = new RandomAccessFile(editFile, "rw" ); + rwf.seek(fileLen-4); // seek to checksum bytes + int b = rwf.readInt(); + rwf.seek(fileLen-4); + rwf.writeInt(b+1); + rwf.close(); + } try { cluster = new MiniDFSCluster.Builder(conf).numDataNodes(NUM_DATA_NODES).format( false ).build(); @@ -739,8 +747,9 @@ public class TestEditLog extends TestCase { throw ioe; } else { GenericTestUtils.assertExceptionContains( - "No non-corrupt logs for txid 3" , - ioe); + "Gap in transactions. Expected to be able to read up until " + + "at least txid 3 but unable to find any edit logs containing " + + "txid 3" , ioe); } } finally { cluster.shutdown(); @@ -769,12 +778,12 @@ public class TestEditLog extends TestCase { } @Override - public long getFirstTxId() throws IOException { + public long getFirstTxId() { return HdfsConstants.INVALID_TXID; } @Override - public long getLastTxId() throws IOException { + public long getLastTxId() { return HdfsConstants.INVALID_TXID; } @@ -1103,9 +1112,9 @@ public class TestEditLog extends TestCase { for (EditLogInputStream edits : editStreams) { FSEditLogLoader.EditLogValidation val = FSEditLogLoader.validateEditLog(edits); - long read = val.getNumTransactions(); + long read = (val.getEndTxId() - edits.getFirstTxId()) + 1; LOG.info( "Loading edits " + edits + " read " + read); - assertEquals(startTxId, val.getStartTxId()); + assertEquals(startTxId, edits.getFirstTxId()); startTxId += read; totaltxnread += read; } @@ -1153,7 +1162,9 @@ public class TestEditLog extends TestCase { fail( "Should have thrown exception" ); } catch (IOException ioe) { GenericTestUtils.assertExceptionContains( - "No non-corrupt logs for txid " + startGapTxId, ioe); + "Gap in transactions. Expected to be able to read up until " + + "at least txid 40 but unable to find any edit logs containing " + + "txid 11" , ioe); } } @@ -1225,6 +1236,114 @@ public class TestEditLog extends TestCase { byte [] garbage = new byte [r.nextInt(MAX_GARBAGE_LENGTH)]; r.nextBytes(garbage); validateNoCrash(garbage); + + } + } + + private static long readAllEdits(Collection<EditLogInputStream> streams, + long startTxId) throws IOException { + FSEditLogOp op; + long nextTxId = startTxId; + long numTx = 0; + for (EditLogInputStream s : streams) { + while ( true ) { + op = s.readOp(); + if (op == null ) + break ; + if (op.getTransactionId() != nextTxId) { + throw new IOException( "out of order transaction ID! expected " + + nextTxId + " but got " + op.getTransactionId() + " when " + + "reading " + s.getName()); + } + numTx++; + nextTxId = op.getTransactionId() + 1; + } + } + return numTx; + } + + /** + * Test edit log failover. If a single edit log is missing, other + * edits logs should be used instead. + */ + @Test + public void testEditLogFailOverFromMissing() throws IOException { + File f1 = new File(TEST_DIR + "/failover0" ); + File f2 = new File(TEST_DIR + "/failover1" ); + List<URI> editUris = ImmutableList.of(f1.toURI(), f2.toURI()); + + NNStorage storage = setupEdits(editUris, 3); + + final long startErrorTxId = 1*TXNS_PER_ROLL + 1; + final long endErrorTxId = 2*TXNS_PER_ROLL; + + File[] files = new File(f1, "current" ).listFiles( new FilenameFilter() { + public boolean accept(File dir, String name) { + if (name.startsWith(NNStorage.getFinalizedEditsFileName(startErrorTxId, + endErrorTxId))) { + return true ; + } + return false ; + } + }); + assertEquals(1, files.length); + assertTrue(files[0].delete()); + + FSEditLog editlog = getFSEditLog(storage); + editlog.initJournalsForWrite(); + long startTxId = 1; + try { + readAllEdits(editlog.selectInputStreams(startTxId, 4*TXNS_PER_ROLL), + startTxId); + } catch (IOException e) { + LOG.error( "edit log failover didn't work" , e); + fail( "Edit log failover didn't work" ); + } + } + + /** + * Test edit log failover from a corrupt edit log + */ + @Test + public void testEditLogFailOverFromCorrupt() throws IOException { + File f1 = new File(TEST_DIR + "/failover0" ); + File f2 = new File(TEST_DIR + "/failover1" ); + List<URI> editUris = ImmutableList.of(f1.toURI(), f2.toURI()); + + NNStorage storage = setupEdits(editUris, 3); + + final long startErrorTxId = 1*TXNS_PER_ROLL + 1; + final long endErrorTxId = 2*TXNS_PER_ROLL; + + File[] files = new File(f1, "current" ).listFiles( new FilenameFilter() { + public boolean accept(File dir, String name) { + if (name.startsWith(NNStorage.getFinalizedEditsFileName(startErrorTxId, + endErrorTxId))) { + return true ; + } + return false ; + } + }); + assertEquals(1, files.length); + + long fileLen = files[0].length(); + LOG.debug( "Corrupting Log File: " + files[0] + " len: " + fileLen); + RandomAccessFile rwf = new RandomAccessFile(files[0], "rw" ); + rwf.seek(fileLen-4); // seek to checksum bytes + int b = rwf.readInt(); + rwf.seek(fileLen-4); + rwf.writeInt(b+1); + rwf.close(); + + FSEditLog editlog = getFSEditLog(storage); + editlog.initJournalsForWrite(); + long startTxId = 1; + try { + readAllEdits(editlog.selectInputStreams(startTxId, 4*TXNS_PER_ROLL), + startTxId); + } catch (IOException e) { + LOG.error( "edit log failover didn't work" , e); + fail( "Edit log failover didn't work" ); } } } diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java index ebcec96..44066ae 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java @@ -58,14 +58,15 @@ public class TestEditLogFileOutputStream { MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0) .build(); + final long START_TXID = 1; StorageDirectory sd = cluster.getNameNode().getFSImage() .getStorage().getStorageDir(0); - File editLog = NNStorage.getInProgressEditsFile(sd, 1); + File editLog = NNStorage.getInProgressEditsFile(sd, START_TXID); EditLogValidation validation = EditLogFileInputStream.validateEditLog(editLog); assertEquals( "Edit log should contain a header as valid length" , HEADER_LEN, validation.getValidLength()); - assertEquals(1, validation.getNumTransactions()); + assertEquals(validation.getEndTxId(), START_TXID); assertEquals( "Edit log should have 1MB pre-allocated, plus 4 bytes " + " for the version number" , EditLogFileOutputStream.PREALLOCATION_LENGTH + 4, editLog.length()); @@ -79,7 +80,7 @@ public class TestEditLogFileOutputStream { assertTrue( "Edit log should have more valid data after writing a txn " + "(was: " + oldLength + " now: " + validation.getValidLength() + ")" , validation.getValidLength() > oldLength); - assertEquals(2, validation.getNumTransactions()); + assertEquals(1, validation.getEndTxId() - START_TXID); assertEquals( "Edit log should be 1MB long , plus 4 bytes for the version number" , EditLogFileOutputStream.PREALLOCATION_LENGTH + 4, editLog.length()); diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java index 1917dde..ba4b9a4 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java @@ -28,6 +28,7 @@ import java.io.IOException; import java.io.RandomAccessFile; import java.nio.channels.FileChannel; import java.util.Map; + import java.util.Set; import java.util.SortedMap; import org.apache.commons.logging.impl.Log4JLogger; @@ -38,16 +39,23 @@ import org.apache.hadoop.hdfs.DFSConfigKeys; import org.apache.hadoop.hdfs.DFSTestUtil; import org.apache.hadoop.hdfs.HdfsConfiguration; import org.apache.hadoop.hdfs.MiniDFSCluster; + import org.apache.hadoop.hdfs.protocol.HdfsConstants; import org.apache.hadoop.hdfs.server.common.Storage.StorageDirectory; import org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.EditLogValidation; + import org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.DeleteOp; + import org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.OpInstanceCache; import org.apache.hadoop.hdfs.server.namenode.NNStorage.NameNodeDirType; import org.apache.hadoop.io.IOUtils; import org.apache.log4j.Level; import org.junit.Test; import com.google.common.collect.Maps; + import com.google.common.collect.Sets; import com.google.common.io.Files; + import static org.mockito.Mockito.doNothing; + import static org.mockito.Mockito.spy; + public class TestFSEditLogLoader { static { @@ -66,8 +74,8 @@ public class TestFSEditLogLoader { Configuration conf = new HdfsConfiguration(); MiniDFSCluster cluster = null ; FileSystem fileSys = null ; - cluster = new MiniDFSCluster.Builder(conf).numDataNodes(NUM_DATA_NODES) - .build(); + cluster = new MiniDFSCluster.Builder(conf).numDataNodes(NUM_DATA_NODES). + enableManagedDfsDirsRedundancy( false ).build(); cluster.waitActive(); fileSys = cluster.getFileSystem(); final FSNamesystem namesystem = cluster.getNamesystem(); @@ -152,108 +160,6 @@ public class TestFSEditLogLoader { } /** - * Test that the valid number of transactions can be counted from a file. - * @ throws IOException - */ - @Test - public void testCountValidTransactions() throws IOException { - File testDir = new File(TEST_DIR, "testCountValidTransactions" ); - File logFile = new File(testDir, - NNStorage.getInProgressEditsFileName(1)); - - // Create a log file, and return the offsets at which each - // transaction starts. - FSEditLog fsel = null ; - final int NUM_TXNS = 30; - SortedMap< Long , Long > offsetToTxId = Maps.newTreeMap(); - try { - fsel = FSImageTestUtil.createStandaloneEditLog(testDir); - fsel.openForWrite(); - assertTrue( "should exist: " + logFile, logFile.exists()); - - for ( int i = 0; i < NUM_TXNS; i++) { - long trueOffset = getNonTrailerLength(logFile); - long thisTxId = fsel.getLastWrittenTxId() + 1; - offsetToTxId.put(trueOffset, thisTxId); - System .err.println( "txid " + thisTxId + " at offset " + trueOffset); - fsel.logDelete( "path" + i, i); - fsel.logSync(); - } - } finally { - if (fsel != null ) { - fsel.close(); - } - } - - // The file got renamed when the log was closed. - logFile = testDir.listFiles()[0]; - long validLength = getNonTrailerLength(logFile); - - // Make sure that uncorrupted log has the expected length and number - // of transactions. - EditLogValidation validation = EditLogFileInputStream.validateEditLog(logFile); - assertEquals(NUM_TXNS + 2, validation.getNumTransactions()); - assertEquals(validLength, validation.getValidLength()); - - // Back up the uncorrupted log - File logFileBak = new File(testDir, logFile.getName() + ".bak" ); - Files.copy(logFile, logFileBak); - - // Corrupt the log file in various ways for each txn - for (Map.Entry< Long , Long > entry : offsetToTxId.entrySet()) { - long txOffset = entry.getKey(); - long txid = entry.getValue(); - - // Restore backup, truncate the file exactly before the txn - Files.copy(logFileBak, logFile); - truncateFile(logFile, txOffset); - validation = EditLogFileInputStream.validateEditLog(logFile); - assertEquals( "Failed when truncating to length " + txOffset, - txid - 1, validation.getNumTransactions()); - assertEquals(txOffset, validation.getValidLength()); - - // Restore backup, truncate the file with one byte in the txn, - // also isn't valid - Files.copy(logFileBak, logFile); - truncateFile(logFile, txOffset + 1); - validation = EditLogFileInputStream.validateEditLog(logFile); - assertEquals( "Failed when truncating to length " + (txOffset + 1), - txid - 1, validation.getNumTransactions()); - assertEquals(txOffset, validation.getValidLength()); - - // Restore backup, corrupt the txn opcode - Files.copy(logFileBak, logFile); - corruptByteInFile(logFile, txOffset); - validation = EditLogFileInputStream.validateEditLog(logFile); - assertEquals( "Failed when corrupting txn opcode at " + txOffset, - txid - 1, validation.getNumTransactions()); - assertEquals(txOffset, validation.getValidLength()); - - // Restore backup, corrupt a byte a few bytes into the txn - Files.copy(logFileBak, logFile); - corruptByteInFile(logFile, txOffset+5); - validation = EditLogFileInputStream.validateEditLog(logFile); - assertEquals( "Failed when corrupting txn data at " + (txOffset+5), - txid - 1, validation.getNumTransactions()); - assertEquals(txOffset, validation.getValidLength()); - } - - // Corrupt the log at every offset to make sure that validation itself - // never throws an exception, and that the calculated lengths are monotonically - // increasing - long prevNumValid = 0; - for ( long offset = 0; offset < validLength; offset++) { - Files.copy(logFileBak, logFile); - corruptByteInFile(logFile, offset); - EditLogValidation val = EditLogFileInputStream.validateEditLog(logFile); - assertTrue( String .format( "%d should have been >= %d" , - val.getNumTransactions(), prevNumValid), - val.getNumTransactions() >= prevNumValid); - prevNumValid = val.getNumTransactions(); - } - } - - /** * Corrupt the byte at the given offset in the given file, * by subtracting 1 from it. */ @@ -316,4 +222,116 @@ public class TestFSEditLogLoader { fis.close(); } } -} + + static private File prepareUnfinalizedTestEditLog(File testDir, int numTx, + SortedMap< Long , Long > offsetToTxId) throws IOException { + File inProgressFile = new File(testDir, NNStorage.getInProgressEditsFileName(1)); + FSEditLog fsel = null , spyLog = null ; + try { + fsel = FSImageTestUtil.createStandaloneEditLog(testDir); + spyLog = spy(fsel); + // Normally, the in-progress edit log would be finalized by + // FSEditLog#endCurrentLogSegment. For testing purposes, we + // disable that here. + doNothing().when(spyLog).endCurrentLogSegment( true ); + spyLog.openForWrite(); + assertTrue( "should exist: " + inProgressFile, inProgressFile.exists()); + + for ( int i = 0; i < numTx; i++) { + long trueOffset = getNonTrailerLength(inProgressFile); + long thisTxId = spyLog.getLastWrittenTxId() + 1; + offsetToTxId.put(trueOffset, thisTxId); + System .err.println( "txid " + thisTxId + " at offset " + trueOffset); + spyLog.logDelete( "path" + i, i); + spyLog.logSync(); + } + } finally { + if (spyLog != null ) { + spyLog.close(); + } else if (fsel != null ) { + fsel.close(); + } + } + return inProgressFile; + } + + @Test + public void testValidateEditLogWithCorruptHeader() throws IOException { + File testDir = new File(TEST_DIR, "testValidateEditLogWithCorruptHeader" ); + SortedMap< Long , Long > offsetToTxId = Maps.newTreeMap(); + File logFile = prepareUnfinalizedTestEditLog(testDir, 2, offsetToTxId); + RandomAccessFile rwf = new RandomAccessFile(logFile, "rw" ); + try { + rwf.seek(0); + rwf.writeLong(42); // corrupt header + } finally { + rwf.close(); + } + EditLogValidation validation = EditLogFileInputStream.validateEditLog(logFile); + assertTrue(validation.hasCorruptHeader()); + } + + @Test + public void testValidateEditLogWithCorruptBody() throws IOException { + File testDir = new File(TEST_DIR, "testValidateEditLogWithCorruptBody" ); + SortedMap< Long , Long > offsetToTxId = Maps.newTreeMap(); + final int NUM_TXNS = 20; + File logFile = prepareUnfinalizedTestEditLog(testDir, NUM_TXNS, + offsetToTxId); + // Back up the uncorrupted log + File logFileBak = new File(testDir, logFile.getName() + ".bak" ); + Files.copy(logFile, logFileBak); + EditLogValidation validation = + EditLogFileInputStream.validateEditLog(logFile); + assertTrue(!validation.hasCorruptHeader()); + // We expect that there will be an OP_START_LOG_SEGMENT, followed by + // NUM_TXNS opcodes, followed by an OP_END_LOG_SEGMENT. + assertEquals(NUM_TXNS + 1, validation.getEndTxId()); + // Corrupt each edit and verify that validation continues to work + for (Map.Entry< Long , Long > entry : offsetToTxId.entrySet()) { + long txOffset = entry.getKey(); + long txId = entry.getValue(); + + // Restore backup, corrupt the txn opcode + Files.copy(logFileBak, logFile); + FSImage.LOG.error( "WATERMELON txId = " + txId + ", txOffset = " + txOffset); + corruptByteInFile(logFile, txOffset); + validation = EditLogFileInputStream.validateEditLog(logFile); + long expectedEndTxId = (txId == (NUM_TXNS + 1)) ? + NUM_TXNS : (NUM_TXNS + 1); + assertEquals( "Failed when corrupting txn opcode at " + txOffset, + expectedEndTxId, validation.getEndTxId()); + assertTrue(!validation.hasCorruptHeader()); + } + + // Truncate right before each edit and verify that validation continues + // to work + for (Map.Entry< Long , Long > entry : offsetToTxId.entrySet()) { + long txOffset = entry.getKey(); + long txId = entry.getValue(); + + // Restore backup, corrupt the txn opcode + Files.copy(logFileBak, logFile); + truncateFile(logFile, txOffset); + validation = EditLogFileInputStream.validateEditLog(logFile); + long expectedEndTxId = (txId == 0) ? + HdfsConstants.INVALID_TXID : (txId - 1); + assertEquals( "Failed when corrupting txid " + txId + " txn opcode " + + "at " + txOffset, expectedEndTxId, validation.getEndTxId()); + assertTrue(!validation.hasCorruptHeader()); + } + } + + @Test + public void testValidateEmptyEditLog() throws IOException { + File testDir = new File(TEST_DIR, "testValidateEmptyEditLog" ); + SortedMap< Long , Long > offsetToTxId = Maps.newTreeMap(); + File logFile = prepareUnfinalizedTestEditLog(testDir, 0, offsetToTxId); + // Truncate the file so that there is nothing except the header + truncateFile(logFile, 4); + EditLogValidation validation = + EditLogFileInputStream.validateEditLog(logFile); + assertTrue(!validation.hasCorruptHeader()); + assertEquals(HdfsConstants.INVALID_TXID, validation.getEndTxId()); + } +} \ No newline at end of file diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java index 0ac1944..c13ff55 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java @@ -44,6 +44,44 @@ import com.google.common.base.Joiner; public class TestFileJournalManager { + /** + * Find out how many transactions we can read from a + * FileJournalManager, starting at a given transaction ID. + * + * @param jm The journal manager + * @param fromTxId Transaction ID to start at + * @param inProgressOk Should we consider edit logs that are not finalized? + * @ return The number of transactions + * @ throws IOException + */ + static long getNumberOfTransactions(FileJournalManager jm, long fromTxId, + boolean inProgressOk) throws IOException { + long txId = fromTxId; + long numTransactions = 0; + EditLogInputStream elis; + while ( true ) { + try { + elis = jm.getInputStream(txId, inProgressOk); + } catch (IOException e) { + if (e.getMessage().startsWith( "Cannot find editlog file containing " )) { + break ; + } else { + throw e; + } + } + while ( true ) { + FSEditLogOp op = elis.readOp(); + if (op == null ) { + break ; + } + txId = op.getTransactionId(); + numTransactions++; + } + txId++; + } + return numTransactions; + } + /** * Test the normal operation of loading transactions from * file journal manager. 3 edits directories are setup without any @@ -61,7 +99,7 @@ public class TestFileJournalManager { long numJournals = 0; for (StorageDirectory sd : storage.dirIterable(NameNodeDirType.EDITS)) { FileJournalManager jm = new FileJournalManager(sd, storage); - assertEquals(6*TXNS_PER_ROLL, jm.getNumberOfTransactions(1, true )); + assertEquals(6*TXNS_PER_ROLL, getNumberOfTransactions(jm, 1, true )); numJournals++; } assertEquals(3, numJournals); @@ -82,7 +120,7 @@ public class TestFileJournalManager { FileJournalManager jm = new FileJournalManager(sd, storage); assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, - jm.getNumberOfTransactions(1, true )); + getNumberOfTransactions(jm, 1, true )); } /** @@ -104,16 +142,16 @@ public class TestFileJournalManager { Iterator<StorageDirectory> dirs = storage.dirIterator(NameNodeDirType.EDITS); StorageDirectory sd = dirs.next(); FileJournalManager jm = new FileJournalManager(sd, storage); - assertEquals(6*TXNS_PER_ROLL, jm.getNumberOfTransactions(1, true )); + assertEquals(6*TXNS_PER_ROLL, getNumberOfTransactions(jm, 1, true )); sd = dirs.next(); jm = new FileJournalManager(sd, storage); - assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, jm.getNumberOfTransactions(1, + assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, getNumberOfTransactions(jm, 1, true )); sd = dirs.next(); jm = new FileJournalManager(sd, storage); - assertEquals(6*TXNS_PER_ROLL, jm.getNumberOfTransactions(1, true )); + assertEquals(6*TXNS_PER_ROLL, getNumberOfTransactions(jm, 1, true )); } /** @@ -137,17 +175,17 @@ public class TestFileJournalManager { Iterator<StorageDirectory> dirs = storage.dirIterator(NameNodeDirType.EDITS); StorageDirectory sd = dirs.next(); FileJournalManager jm = new FileJournalManager(sd, storage); - assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, jm.getNumberOfTransactions(1, + assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, getNumberOfTransactions(jm, 1, true )); sd = dirs.next(); jm = new FileJournalManager(sd, storage); - assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, jm.getNumberOfTransactions(1, + assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, getNumberOfTransactions(jm, 1, true )); sd = dirs.next(); jm = new FileJournalManager(sd, storage); - assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, jm.getNumberOfTransactions(1, + assertEquals(5*TXNS_PER_ROLL + TXNS_PER_FAIL, getNumberOfTransactions(jm, 1, true )); } @@ -198,17 +236,17 @@ public class TestFileJournalManager { FileJournalManager jm = new FileJournalManager(sd, storage); long expectedTotalTxnCount = TXNS_PER_ROLL*10 + TXNS_PER_FAIL; - assertEquals(expectedTotalTxnCount, jm.getNumberOfTransactions(1, true )); + assertEquals(expectedTotalTxnCount, getNumberOfTransactions(jm, 1, true )); long skippedTxns = (3*TXNS_PER_ROLL); // skip first 3 files long startingTxId = skippedTxns + 1; - long numTransactionsToLoad = jm.getNumberOfTransactions(startingTxId, true ); + long numTransactionsToLoad = getNumberOfTransactions(jm, startingTxId, true ); long numLoaded = 0; while (numLoaded < numTransactionsToLoad) { EditLogInputStream editIn = jm.getInputStream(startingTxId, true ); FSEditLogLoader.EditLogValidation val = FSEditLogLoader.validateEditLog(editIn); - long count = val.getNumTransactions(); + long count = (val.getEndTxId() - startingTxId) + 1; editIn.close(); startingTxId += count; @@ -236,7 +274,7 @@ public class TestFileJournalManager { // 10 rolls, so 11 rolled files, 110 txids total. final int TOTAL_TXIDS = 10 * 11; for ( int txid = 1; txid <= TOTAL_TXIDS; txid++) { - assertEquals((TOTAL_TXIDS - txid) + 1, jm.getNumberOfTransactions(txid, + assertEquals((TOTAL_TXIDS - txid) + 1, getNumberOfTransactions(jm, txid, true )); } } @@ -269,10 +307,10 @@ public class TestFileJournalManager { assertTrue(files[0].delete()); FileJournalManager jm = new FileJournalManager(sd, storage); - assertEquals(startGapTxId-1, jm.getNumberOfTransactions(1, true )); + assertEquals(startGapTxId-1, getNumberOfTransactions(jm, 1, true )); try { - jm.getNumberOfTransactions(startGapTxId, true ); + getNumberOfTransactions(jm, startGapTxId, true ); fail( "Should have thrown an exception by now" ); } catch (IOException ioe) { GenericTestUtils.assertExceptionContains( @@ -281,7 +319,7 @@ public class TestFileJournalManager { // rolled 10 times so there should be 11 files. assertEquals(11*TXNS_PER_ROLL - endGapTxId, - jm.getNumberOfTransactions(endGapTxId + 1, true )); + getNumberOfTransactions(jm, endGapTxId + 1, true )); } /** @@ -308,7 +346,7 @@ public class TestFileJournalManager { FileJournalManager jm = new FileJournalManager(sd, storage); assertEquals(10*TXNS_PER_ROLL+1, - jm.getNumberOfTransactions(1, true )); + getNumberOfTransactions(jm, 1, true )); } @Test @@ -381,7 +419,7 @@ public class TestFileJournalManager { FileJournalManager jm = new FileJournalManager(sd, storage); // If we exclude the in-progess stream, we should only have 100 tx. - assertEquals(100, jm.getNumberOfTransactions(1, false )); + assertEquals(100, getNumberOfTransactions(jm, 1, false )); EditLogInputStream elis = jm.getInputStream(90, false ); FSEditLogOp lastReadOp = null ; diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java index 51e49a9..f677557 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java @@ -150,12 +150,6 @@ public class TestGenericJournalConf { } @Override - public long getNumberOfTransactions( long fromTxnId, boolean inProgressOk) - throws IOException { - return 0; - } - - @Override public void setOutputBufferCapacity( int size) {} @Override diff --git hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java index 5a86fbf..48f8f95 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java +++ hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java @@ -333,7 +333,7 @@ public class TestNameNodeRecovery { static void testNameNodeRecoveryImpl(Corruptor corruptor, boolean finalize) throws IOException { final String TEST_PATH = "/test/path/dir" ; - final int NUM_TEST_MKDIRS = 10; + final String TEST_PATH2 = "/second/dir" ; final boolean needRecovery = corruptor.needRecovery(finalize); // start a cluster @@ -342,8 +342,8 @@ public class TestNameNodeRecovery { FileSystem fileSys = null ; StorageDirectory sd = null ; try { - cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0) - .build(); + cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0). + enableManagedDfsDirsRedundancy( false ).build(); cluster.waitActive(); if (!finalize) { // Normally, the in-progress edit log would be finalized by @@ -357,9 +357,8 @@ public class TestNameNodeRecovery { fileSys = cluster.getFileSystem(); final FSNamesystem namesystem = cluster.getNamesystem(); FSImage fsimage = namesystem.getFSImage(); - for ( int i = 0; i < NUM_TEST_MKDIRS; i++) { - fileSys.mkdirs( new Path(TEST_PATH)); - } + fileSys.mkdirs( new Path(TEST_PATH)); + fileSys.mkdirs( new Path(TEST_PATH2)); sd = fsimage.getStorage().dirIterator(NameNodeDirType.EDITS).next(); } finally { if (cluster != null ) { @@ -371,6 +370,7 @@ public class TestNameNodeRecovery { assertTrue( "Should exist: " + editFile, editFile.exists()); // Corrupt the edit log + LOG.info( "corrupting edit log file '" + editFile + "'" ); corruptor.corrupt(editFile); // If needRecovery == true , make sure that we can't start the @@ -378,8 +378,8 @@ public class TestNameNodeRecovery { cluster = null ; try { LOG.debug( "trying to start normally ( this should fail)..." ); - cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0) - .format( false ).build(); + cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0). + enableManagedDfsDirsRedundancy( false ).format( false ).build(); cluster.waitActive(); cluster.shutdown(); if (needRecovery) { @@ -403,8 +403,9 @@ public class TestNameNodeRecovery { cluster = null ; try { LOG.debug( "running recovery..." ); - cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0) - .format( false ).startupOption(recoverStartOpt).build(); + cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0). + enableManagedDfsDirsRedundancy( false ).format( false ). + startupOption(recoverStartOpt).build(); } catch (IOException e) { fail( "caught IOException while trying to recover. " + "message was " + e.getMessage() + @@ -420,7 +421,7 @@ public class TestNameNodeRecovery { try { LOG.debug( "starting cluster normally after recovery..." ); cluster = new MiniDFSCluster.Builder(conf).numDataNodes(0) - .format( false ).build(); + .enableManagedDfsDirsRedundancy( false ).format( false ).build(); LOG.debug( "successfully recovered the " + corruptor.getName() + " corrupted edit log" ); assertTrue(cluster.getFileSystem().exists( new Path(TEST_PATH)));
          Hide
          Todd Lipcon added a comment -

          er.. that sort of works.. maybe easier to push it to a public github repo?

          Show
          Todd Lipcon added a comment - er.. that sort of works.. maybe easier to push it to a public github repo?
          Hide
          Colin Patrick McCabe added a comment -

          submit through JIRA

          Show
          Colin Patrick McCabe added a comment - submit through JIRA
          Hide
          Colin Patrick McCabe added a comment -

          latest patch fixes inProgress handling for RedundantEditLogInputStream

          Show
          Colin Patrick McCabe added a comment - latest patch fixes inProgress handling for RedundantEditLogInputStream
          Hide
          Todd Lipcon added a comment -
          • You're still missing the license and annotations on RedundantEditLogInputStream.java
          +    // and can't be pre-transactional.
          +    for (EditLogInputStream s : streams) {
          +      Preconditions.checkArgument(s.getFirstTxId() !=
          +          HdfsConstants.INVALID_TXID);
          +      Preconditions.checkArgument(s.getLastTxId() !=
          +          HdfsConstants.INVALID_TXID);
          +    }
          

          Can you add a format string argument to these checks, so that if they fail, it will print s as a string? i.e checkArgument(..., "bad stream: %s", s);


          +    /* We sort the streams here so that the streams that end later come first.
          +     */
          

          Style (// for inline comments, see above)


          +        LOG.error("Got error reading edit log input stream " +
          +          streams[curIdx].getName(), prevException);
          

          Will it have already logged the offset of the error? Or will the exception itself contain the offset? Otherwise we should include it in the error message.


          • getPosition() in the merged stream now returns th eposition of the underlying stream, which increases as we read one file and then resets back to zero. But, in FSEditLog, we track these offsets for error reporting purposes. We need to make sure that, if there is an unrecoverable corruption, the log messages specifically identify the path and offset of the corruption. I'm not sure that's the case, now that we have the extra abstraction here. Can you try using a single storage dir and corrupting the logs somewhere in a middle segment?
          • Can you add a note to the javadoc for the redundant stream that it doesn't handle the "ping pong" scenario? ie that if a segment has an error, we will discard that segment and then move to the next one?
          • Regarding memory usage: I'm afraid that each stream opened will end up maintaining a large buffer, since it's generally wrapped with BufferedInputStream, and we use mark(100MB). Maybe we should close each stream as soon as we finish with it, rather than waiting until the close() call at the end. Have you tested loading a large edit log composed of many segments? eg a total 1GB log, made of 10 100MB segments, on a NN with say 1G heap?
          Show
          Todd Lipcon added a comment - You're still missing the license and annotations on RedundantEditLogInputStream.java + // and can't be pre-transactional. + for (EditLogInputStream s : streams) { + Preconditions.checkArgument(s.getFirstTxId() != + HdfsConstants.INVALID_TXID); + Preconditions.checkArgument(s.getLastTxId() != + HdfsConstants.INVALID_TXID); + } Can you add a format string argument to these checks, so that if they fail, it will print s as a string? i.e checkArgument(..., "bad stream: %s", s); + /* We sort the streams here so that the streams that end later come first. + */ Style (// for inline comments, see above) + LOG.error( "Got error reading edit log input stream " + + streams[curIdx].getName(), prevException); Will it have already logged the offset of the error? Or will the exception itself contain the offset? Otherwise we should include it in the error message. getPosition() in the merged stream now returns th eposition of the underlying stream, which increases as we read one file and then resets back to zero. But, in FSEditLog, we track these offsets for error reporting purposes. We need to make sure that, if there is an unrecoverable corruption, the log messages specifically identify the path and offset of the corruption. I'm not sure that's the case, now that we have the extra abstraction here. Can you try using a single storage dir and corrupting the logs somewhere in a middle segment? Can you add a note to the javadoc for the redundant stream that it doesn't handle the "ping pong" scenario? ie that if a segment has an error, we will discard that segment and then move to the next one? Regarding memory usage: I'm afraid that each stream opened will end up maintaining a large buffer, since it's generally wrapped with BufferedInputStream, and we use mark(100MB). Maybe we should close each stream as soon as we finish with it, rather than waiting until the close() call at the end. Have you tested loading a large edit log composed of many segments? eg a total 1GB log, made of 10 100MB segments, on a NN with say 1G heap?
          Hide
          Colin Patrick McCabe added a comment -

          Will [the RedundantEditLogInputStream error message] have already logged the offset of the error? Or will the exception itself contain the offset? Otherwise we should include it in the error message.

          Yes, the exception itself contains the offsets, as generated by FSEditLogLoader. So there's no need to re-add them here.

          getPosition() in the merged stream now returns th eposition of the underlying stream, which increases as we read one file and then resets back to zero. But, in FSEditLog, we track these offsets for error reporting purposes. We need to make sure that, if there is an unrecoverable corruption, the log messages specifically identify the path and offset of the corruption. I'm not sure that's the case, now that we have the extra abstraction here. Can you try using a single storage dir and corrupting the logs somewhere in a middle segment?

          I think EditLogInputStream#getPosition just needs to go away and be replaced by a function that gives a human-readable description of "where you are." This is especially true because we're soon going to have some edit logs like the quorum edit logs where there is no real concept of file position. I'm not sure if I'm brave enough to try to cram that into this change, though.

          I will run through a recovery scenario and make sure the current printout makes sense and can be followed. I'm pretty sure that they do, but it's good to check.

          Regarding memory usage: I'm afraid that each stream opened will end up maintaining a large buffer, since it's generally wrapped with BufferedInputStream, and we use mark(100MB). Maybe we should close each stream as soon as we finish with it, rather than waiting until the close() call at the end. Have you tested loading a large edit log composed of many segments? eg a total 1GB log, made of 10 100MB segments, on a NN with say 1G heap?

          Good catch. I think I have a solution for this one. Will post shortly.

          Show
          Colin Patrick McCabe added a comment - Will [the RedundantEditLogInputStream error message] have already logged the offset of the error? Or will the exception itself contain the offset? Otherwise we should include it in the error message. Yes, the exception itself contains the offsets, as generated by FSEditLogLoader. So there's no need to re-add them here. getPosition() in the merged stream now returns th eposition of the underlying stream, which increases as we read one file and then resets back to zero. But, in FSEditLog, we track these offsets for error reporting purposes. We need to make sure that, if there is an unrecoverable corruption, the log messages specifically identify the path and offset of the corruption. I'm not sure that's the case, now that we have the extra abstraction here. Can you try using a single storage dir and corrupting the logs somewhere in a middle segment? I think EditLogInputStream#getPosition just needs to go away and be replaced by a function that gives a human-readable description of "where you are." This is especially true because we're soon going to have some edit logs like the quorum edit logs where there is no real concept of file position. I'm not sure if I'm brave enough to try to cram that into this change, though. I will run through a recovery scenario and make sure the current printout makes sense and can be followed. I'm pretty sure that they do, but it's good to check. Regarding memory usage: I'm afraid that each stream opened will end up maintaining a large buffer, since it's generally wrapped with BufferedInputStream, and we use mark(100MB). Maybe we should close each stream as soon as we finish with it, rather than waiting until the close() call at the end. Have you tested loading a large edit log composed of many segments? eg a total 1GB log, made of 10 100MB segments, on a NN with say 1G heap? Good catch. I think I have a solution for this one. Will post shortly.
          Hide
          Colin Patrick McCabe added a comment -
          • Reduce MAX_OP_SIZE to 1.5 megabytes
          • Add StreamLimiter interface. This will let us get an IOException when we read more than 1.5 megabytes out of a stream. Basically, the idea is to enforce MAX_OP_SIZE at read time, rather than failing later during in.reset() because in.mark() was not set high enough.
          • add ASF header to RedundantEditLogInputStream
          • test that PREALLOCATION_LENGTH < MAX_OP_SIZE
          Show
          Colin Patrick McCabe added a comment - Reduce MAX_OP_SIZE to 1.5 megabytes Add StreamLimiter interface. This will let us get an IOException when we read more than 1.5 megabytes out of a stream. Basically, the idea is to enforce MAX_OP_SIZE at read time, rather than failing later during in.reset() because in.mark() was not set high enough. add ASF header to RedundantEditLogInputStream test that PREALLOCATION_LENGTH < MAX_OP_SIZE
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12527527/HDFS-3049.012.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2445//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527527/HDFS-3049.012.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified test files. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2445//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • fix BK compile
          • add testManyEditLogSegments
          Show
          Colin Patrick McCabe added a comment - fix BK compile add testManyEditLogSegments
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12527547/HDFS-3049.013.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.TestDFSRollback
          org.apache.hadoop.hdfs.server.namenode.TestFileJournalManager

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2446//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2446//artifact/trunk/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2446//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527547/HDFS-3049.013.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.server.namenode.TestFileJournalManager +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2446//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2446//artifact/trunk/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2446//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          I've got another rev of this that fixes the "O(n^2) calls to readdir" problem. I'll post it as soon as it passes all the unit tests for me.

          I found that there was an additional complication. EditLogFileInputStream needs to avoid opening a file descriptor in its constructor. Otherwise, materializing a list of EditLogFileInputStream objects could open thousands of file descriptors. So I needed to open the file opportunistically the first time we call readOp. I think this will also boost NFS performance, since it means less I/O in general.

          Show
          Colin Patrick McCabe added a comment - I've got another rev of this that fixes the "O(n^2) calls to readdir" problem. I'll post it as soon as it passes all the unit tests for me. I found that there was an additional complication. EditLogFileInputStream needs to avoid opening a file descriptor in its constructor. Otherwise, materializing a list of EditLogFileInputStream objects could open thousands of file descriptors. So I needed to open the file opportunistically the first time we call readOp. I think this will also boost NFS performance, since it means less I/O in general.
          Hide
          Colin Patrick McCabe added a comment -
          • get rid of the old JournalManager#getInputStream API, which selected a single EditLogInputStream. This forced us to do a readdir() operations per edit log segment in FileJournalManager, since we had to iterate over all files in the directory each time getInputStream was called.
          • The new API, selectInputStreams, adds all the streams it can to a collection. It never throws. One subtle improvement here is that previously we might sometimes ignore an edit log simply because another edit log threw an exception when being examined in getInputStream. This is fixed now.
          • EditLogFileInputStream: don't open any file descriptors when the edit log is first materialized. Since we're creating all the edit log structures "up front" now, this is important.
          • EditLogFileInputStream: add class LOG, rather than using FSImage's LOG.
          • EditLogInputStream#nextValidOp (default trivial implementation): catch RuntimeException as well as IOException. nextValidOp is not supposed to throw.
          Show
          Colin Patrick McCabe added a comment - get rid of the old JournalManager#getInputStream API, which selected a single EditLogInputStream. This forced us to do a readdir() operations per edit log segment in FileJournalManager, since we had to iterate over all files in the directory each time getInputStream was called. The new API, selectInputStreams, adds all the streams it can to a collection. It never throws. One subtle improvement here is that previously we might sometimes ignore an edit log simply because another edit log threw an exception when being examined in getInputStream. This is fixed now. EditLogFileInputStream: don't open any file descriptors when the edit log is first materialized. Since we're creating all the edit log structures "up front" now, this is important. EditLogFileInputStream: add class LOG, rather than using FSImage's LOG. EditLogInputStream#nextValidOp (default trivial implementation): catch RuntimeException as well as IOException. nextValidOp is not supposed to throw.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12527733/HDFS-3049.015.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.TestDFSUpgrade
          org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
          org.apache.hadoop.hdfs.TestDFSRollback
          org.apache.hadoop.hdfs.server.namenode.TestFileJournalManager
          org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2449//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2449//artifact/trunk/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2449//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527733/HDFS-3049.015.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.TestDFSUpgrade org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.server.namenode.TestFileJournalManager org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2449//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2449//artifact/trunk/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2449//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12527802/HDFS-3049.017.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 10 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2458//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2458//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527802/HDFS-3049.017.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2458//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2458//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • fix mockito stuff in TestFailureToReadEdits
          Show
          Colin Patrick McCabe added a comment - fix mockito stuff in TestFailureToReadEdits
          Hide
          Colin Patrick McCabe added a comment -

          general description of the changes between 15 and 17/18 (sorry for not posting this earlier, it was late):

          • update some unit tests. In particular TestFileJournalManager#getNumberOfTransactions now takes a parameter that specifies whether it stops counting transactions at a gap, or not. Some functions in the test want one behavior or the other.
          • fix an off-by-one error in checkForGaps.
          • remove some deadcode that was causing a findbugs warning
          • fix a case where String.format was getting the wrong number of args
          • fix validation of files with corrupt headers (basically, force a header read).
          Show
          Colin Patrick McCabe added a comment - general description of the changes between 15 and 17/18 (sorry for not posting this earlier, it was late): update some unit tests. In particular TestFileJournalManager#getNumberOfTransactions now takes a parameter that specifies whether it stops counting transactions at a gap, or not. Some functions in the test want one behavior or the other. fix an off-by-one error in checkForGaps. remove some deadcode that was causing a findbugs warning fix a case where String.format was getting the wrong number of args fix validation of files with corrupt headers (basically, force a header read).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12527878/HDFS-3049.018.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 10 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2462//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2462//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527878/HDFS-3049.018.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2462//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2462//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          (noe: the javadoc warnings relate to gridmx and were not introduced by this change)

          Show
          Colin Patrick McCabe added a comment - (noe: the javadoc warnings relate to gridmx and were not introduced by this change)
          Hide
          Todd Lipcon added a comment -
          +      } catch (RuntimeException e) {
          +        LOG.error("caught exception initializing " + this, e);
          +        state = State.CLOSED;
          +        return null;
          +      } catch (IOException e) {
          +        LOG.error("caught exception initializing " + this, e);
          +        state = State.CLOSED;
          +        return null;
          +      }
          

          Why not simply catch Throwable here?

          Also, I'm not sure why the exception is swallowed instead of rethrown. If it fails to open the edit log, shouldn't that generate an exception on init()? Should we make init() a public interface (eg "open()") instead, so that the caller is cognizant of the flow here, instead of doing it lazily? I think that would also simplify the other functions, which could just do a Preconditions.checkState(state == State.OPEN) instead of always handling lazy-init.


          +          LOG.info("skipping the rest of " + this + " since we " +
          +              "reached txId " + txId);
          +          close();
          

          Why do you close() here but not close() in the normal case where you reach the end of the log? It seems it should be up to the caller to close upon hitting the "eof" (null txn) either way.


               try {
          -      return reader.readOp(true);
          +      return nextOpImpl(true);
               } catch (IOException e) {
          +      LOG.error("nextValidOp: got exception while reading " + this, e);
          +      return null;
          +    } catch (RuntimeException e) {
          +      LOG.error("nextValidOp: got exception while reading " + this, e);
                 return null;
               }
          

          again, why not just catch Throwable?


          +    if (!streams.isEmpty()) {
          +      String error = String.format("Cannot start writing at txid %s " +
          +        "when there is a stream available for read: %s",
          +        segmentTxId, streams.get(0));
          +      IOUtils.cleanup(LOG, streams.toArray(new EditLogInputStream[0]));
          +      throw new RuntimeException(error);
               }
          

          IllegalStateException would be more appropriate here


          Changes to PositionTrackingInputStream: can you refactor out a function like checkLimit(int amountToRead); here? Lots of duplicate code.

          Why is the opcode size changed from 100MB to 1.5MB? Didn't you just change it to 100MB recently?

          Also, why is this change to the limiting behavior lumped in here? It's hard to review when the patch has a lot of distinct changes put together.

          StreamLimiter needs an interface annotation, or be made package private.


          • There are a lot of new LOG.info messages which look more like they should be debug level. I don't think the operator would be able to make sense of all this output.

          How hard would it be to separate this into three patches?
          1) Bug fix which uses the new StreamLimiter to fix the issue you mentioned higher up in the comments (and seems distinct from the rest)
          2) Change the API to get rid of getInputStream() and fix the O(n^2) behavior
          3) Introduce RedundantInputStream to solve the issue described in this JIRA

          I think there really are three separate things going on here and the 120KB patch is difficult to digest.

          Show
          Todd Lipcon added a comment - + } catch (RuntimeException e) { + LOG.error( "caught exception initializing " + this , e); + state = State.CLOSED; + return null ; + } catch (IOException e) { + LOG.error( "caught exception initializing " + this , e); + state = State.CLOSED; + return null ; + } Why not simply catch Throwable here? Also, I'm not sure why the exception is swallowed instead of rethrown. If it fails to open the edit log, shouldn't that generate an exception on init()? Should we make init() a public interface (eg "open()") instead, so that the caller is cognizant of the flow here, instead of doing it lazily? I think that would also simplify the other functions, which could just do a Preconditions.checkState(state == State.OPEN) instead of always handling lazy-init. + LOG.info( "skipping the rest of " + this + " since we " + + "reached txId " + txId); + close(); Why do you close() here but not close() in the normal case where you reach the end of the log? It seems it should be up to the caller to close upon hitting the "eof" (null txn) either way. try { - return reader.readOp( true ); + return nextOpImpl( true ); } catch (IOException e) { + LOG.error( "nextValidOp: got exception while reading " + this , e); + return null ; + } catch (RuntimeException e) { + LOG.error( "nextValidOp: got exception while reading " + this , e); return null ; } again, why not just catch Throwable? + if (!streams.isEmpty()) { + String error = String .format( "Cannot start writing at txid %s " + + "when there is a stream available for read: %s" , + segmentTxId, streams.get(0)); + IOUtils.cleanup(LOG, streams.toArray( new EditLogInputStream[0])); + throw new RuntimeException(error); } IllegalStateException would be more appropriate here Changes to PositionTrackingInputStream: can you refactor out a function like checkLimit(int amountToRead); here? Lots of duplicate code. Why is the opcode size changed from 100MB to 1.5MB? Didn't you just change it to 100MB recently? Also, why is this change to the limiting behavior lumped in here? It's hard to review when the patch has a lot of distinct changes put together. StreamLimiter needs an interface annotation, or be made package private. There are a lot of new LOG.info messages which look more like they should be debug level. I don't think the operator would be able to make sense of all this output. How hard would it be to separate this into three patches? 1) Bug fix which uses the new StreamLimiter to fix the issue you mentioned higher up in the comments (and seems distinct from the rest) 2) Change the API to get rid of getInputStream() and fix the O(n^2) behavior 3) Introduce RedundantInputStream to solve the issue described in this JIRA I think there really are three separate things going on here and the 120KB patch is difficult to digest.
          Hide
          Colin Patrick McCabe added a comment -

          Also, I'm not sure why the exception is swallowed instead of rethrown. If it fails to open the edit log, shouldn't that generate an exception on init()? Should we make init() a public interface (eg "open()") instead, so that the caller is cognizant of the flow here, instead of doing it lazily? I think that would also simplify the other functions, which could just do a Preconditions.checkState(state == State.OPEN) instead of always handling lazy-init.

          You're right-- the exception should be thrown if resync == false.

          As for creating a public init() method-- I guess, but this patch is getting kind of big already. Perhaps we could file a separate JIRA for that? I also have some other API ideas that might improve efficiency (not going to discuss them here due to space constraints)

          Why do you close() here but not close() in the normal case where you reach the end of the log? It seems it should be up to the caller to close upon hitting the "eof" (null txn) either way.

          The rationale behind this was discussed in HDFS-3335. Basically, if there is corruption at the end of the log, but we read everything we were supposed to, we don't want to throw an exception. As for closing in the eof case, that seems unecessary. The caller has to call close() anyway, that's part of the contract for this API. So we don't really add any value by doing it automatically.

          again, why not just catch Throwable?

          Yeah, we should do that. Will fix.

          IllegalStateException would be more appropriate here

          ok

          [streamlimiter comments]

          agree with most/all of this. I think this can be separated out (probably)

          [log message comments]

          yes, probably some of those should be debug comments. Probably at least the ones which just describe "situation normal, added new stream" etc.

          [separate into 3 patches]
          well, it's worth a try. There are some non-obvious dependencies, but I'll give it a try.

          Show
          Colin Patrick McCabe added a comment - Also, I'm not sure why the exception is swallowed instead of rethrown. If it fails to open the edit log, shouldn't that generate an exception on init()? Should we make init() a public interface (eg "open()") instead, so that the caller is cognizant of the flow here, instead of doing it lazily? I think that would also simplify the other functions, which could just do a Preconditions.checkState(state == State.OPEN) instead of always handling lazy-init. You're right-- the exception should be thrown if resync == false. As for creating a public init() method-- I guess, but this patch is getting kind of big already. Perhaps we could file a separate JIRA for that? I also have some other API ideas that might improve efficiency (not going to discuss them here due to space constraints) Why do you close() here but not close() in the normal case where you reach the end of the log? It seems it should be up to the caller to close upon hitting the "eof" (null txn) either way. The rationale behind this was discussed in HDFS-3335 . Basically, if there is corruption at the end of the log, but we read everything we were supposed to, we don't want to throw an exception. As for closing in the eof case, that seems unecessary. The caller has to call close() anyway, that's part of the contract for this API. So we don't really add any value by doing it automatically. again, why not just catch Throwable? Yeah, we should do that. Will fix. IllegalStateException would be more appropriate here ok [streamlimiter comments] agree with most/all of this. I think this can be separated out (probably) [log message comments] yes, probably some of those should be debug comments. Probably at least the ones which just describe "situation normal, added new stream" etc. [separate into 3 patches] well, it's worth a try. There are some non-obvious dependencies, but I'll give it a try.
          Hide
          Colin Patrick McCabe added a comment -

          smaller patch which strips out RedundantEditLogStream, StreamLimiter

          Fixed some comments, addressed todd's comments. Many Log.info messages changed to debug, etc.

          Show
          Colin Patrick McCabe added a comment - smaller patch which strips out RedundantEditLogStream, StreamLimiter Fixed some comments, addressed todd's comments. Many Log.info messages changed to debug, etc.
          Hide
          Colin Patrick McCabe added a comment -

          FYI: I'm posting the patch for the startup performance stuff over at HDFS-2982.

          Show
          Colin Patrick McCabe added a comment - FYI: I'm posting the patch for the startup performance stuff over at HDFS-2982 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12527991/HDFS-3049.021.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2467//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2467//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527991/HDFS-3049.021.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2467//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2467//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          new patch based on trunk (after the merge of HDFS-2982, etc.)

          Show
          Colin Patrick McCabe added a comment - new patch based on trunk (after the merge of HDFS-2982 , etc.)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12529262/HDFS-3049.023.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2515//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2515//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529262/HDFS-3049.023.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2515//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2515//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • use PriorityQueue rather than Set. The former can contain duplicates, so our comparator does not need to be as precise.
          • fix test failure that came from incorrectly assuming that edit log failover was not in effect
          Show
          Colin Patrick McCabe added a comment - use PriorityQueue rather than Set. The former can contain duplicates, so our comparator does not need to be as precise. fix test failure that came from incorrectly assuming that edit log failover was not in effect
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12529662/HDFS-3049.025.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestDatanodeBlockScanner

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2517//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2517//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529662/HDFS-3049.025.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeBlockScanner +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2517//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2517//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • rebase on trunk
          Show
          Colin Patrick McCabe added a comment - rebase on trunk
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530077/HDFS-3049.026.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2533//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2533//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530077/HDFS-3049.026.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2533//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2533//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -
          + * We wil currently try each edit log stream exactly once.  In other words, we
          

          typo: 'wil'


          • please add a blank line before and after the definition of the State enum
          • there are some WATERMELONs in your patch I think you should probably remove
          • within a given RedundantEditLogInputStream, there's an expectation that they all have the same start txid, right? you should check this in your preconditions loop
          • not sure of the logic for EOF: let's say I have two streams, one is tx 1-15, and the other is 1-20. When we sort, they'll be in the order (1-20, 1-15). I then encounter an error at txid #5 in the first stream, so I switch to the second stream. This stream will then return "null" after reading txid #15, even though there are really 5 more txns in the group. Right?

          +              streams[curIdx].getName() + ".  During automatic failover, " +
          +              "we noticed that all of the remaining edit log streams are " +
          +              "shorter than the current one!  The best " + 
          

          I don't like using the term "automatic failover" here - because that's the terminology we use for HA. Instead, perhaps something like "We could not find any other edit log which contains transactions following txid %d"?

          +        LOG.error("Got error reading edit log input stream " +
          +          streams[curIdx].getName(), prevException);
          +        LOG.error("failing over to edit log " + 
          +          streams[curIdx + 1].getName());
          

          Combine these into one log message

          -

          I find the state machine here somewhat confusing. Is there no clearer way to write it? Maybe an ascii art transition diagram would help, or at least for each state a list of which states it can transition to, and under what circumstances? To paraphrase someone or other, it's "not obviouslly incorrect" but also not "obviously correct"

          Show
          Todd Lipcon added a comment - + * We wil currently try each edit log stream exactly once. In other words, we typo: 'wil' please add a blank line before and after the definition of the State enum there are some WATERMELONs in your patch I think you should probably remove within a given RedundantEditLogInputStream, there's an expectation that they all have the same start txid, right? you should check this in your preconditions loop not sure of the logic for EOF: let's say I have two streams, one is tx 1-15, and the other is 1-20. When we sort, they'll be in the order (1-20, 1-15). I then encounter an error at txid #5 in the first stream, so I switch to the second stream. This stream will then return "null" after reading txid #15, even though there are really 5 more txns in the group. Right? + streams[curIdx].getName() + ". During automatic failover, " + + "we noticed that all of the remaining edit log streams are " + + "shorter than the current one! The best " + I don't like using the term "automatic failover" here - because that's the terminology we use for HA. Instead, perhaps something like "We could not find any other edit log which contains transactions following txid %d"? — + LOG.error( "Got error reading edit log input stream " + + streams[curIdx].getName(), prevException); + LOG.error( "failing over to edit log " + + streams[curIdx + 1].getName()); Combine these into one log message - I find the state machine here somewhat confusing. Is there no clearer way to write it? Maybe an ascii art transition diagram would help, or at least for each state a list of which states it can transition to, and under what circumstances? To paraphrase someone or other, it's "not obviouslly incorrect" but also not "obviously correct"
          Hide
          Colin Patrick McCabe added a comment -

          not sure of the logic for EOF: let's say I have two streams, one is tx 1-15, and the other is 1-20. When we sort, they'll be in the order (1-20, 1-15). I then encounter an error at txid #5 in the first stream, so I switch to the second stream. This stream will then return "null" after reading txid #15, even though there are really 5 more txns in the group. Right?

          This is the intended behavior. If the stream is NOT the last one in the edit log, then the reader will notice a gap and throw an exception. To extend your example, if there was a (21 - 30) stream following the (1 - 20) stream, then the gap would be immediately apparent to the reader. This gap would prevent normal startup.

          If the stream is the last (unfinalized) one in the edit log, then we'll simply only believe that there are 15 transactions in total. This may seem like the wrong thing to do, but consider the following scenario:

          1. NameNode writes out transactions 16-20 to the first edit log
          2. NameNode dies WITHOUT acknowledging transactions 16-20 to the clients or writing it to edit log #2
          3. StandbyNameNode tries to take over

          Do you want step 4 to be "StandbyNameNode crashes because the unfinalized edit logs had different lengths" or "StandbyNameNode starts up normally"?

          I don't like using the term "automatic failover" here - because that's the terminology we use for HA. Instead, perhaps something like "We could not find any other edit log which contains transactions following txid %d"?

          Yeah, I guess that was rather confusing. I'll change it to "edit log failover" or something.

          With regard to the state machine, I agree that an ASCII art diagram would help. The state machine itself makes things a lot simpler because otherwise you'd have like a dozen variables interacting in complex ways, as opposed to just curIdx, prevTxId, and prevException.

          Show
          Colin Patrick McCabe added a comment - not sure of the logic for EOF: let's say I have two streams, one is tx 1-15, and the other is 1-20. When we sort, they'll be in the order (1-20, 1-15). I then encounter an error at txid #5 in the first stream, so I switch to the second stream. This stream will then return "null" after reading txid #15, even though there are really 5 more txns in the group. Right? This is the intended behavior. If the stream is NOT the last one in the edit log, then the reader will notice a gap and throw an exception. To extend your example, if there was a (21 - 30) stream following the (1 - 20) stream, then the gap would be immediately apparent to the reader. This gap would prevent normal startup. If the stream is the last (unfinalized) one in the edit log, then we'll simply only believe that there are 15 transactions in total. This may seem like the wrong thing to do, but consider the following scenario: 1. NameNode writes out transactions 16-20 to the first edit log 2. NameNode dies WITHOUT acknowledging transactions 16-20 to the clients or writing it to edit log #2 3. StandbyNameNode tries to take over Do you want step 4 to be "StandbyNameNode crashes because the unfinalized edit logs had different lengths" or "StandbyNameNode starts up normally"? I don't like using the term "automatic failover" here - because that's the terminology we use for HA. Instead, perhaps something like "We could not find any other edit log which contains transactions following txid %d"? Yeah, I guess that was rather confusing. I'll change it to "edit log failover" or something. With regard to the state machine, I agree that an ASCII art diagram would help. The state machine itself makes things a lot simpler because otherwise you'd have like a dozen variables interacting in complex ways, as opposed to just curIdx, prevTxId, and prevException.
          Hide
          Colin Patrick McCabe added a comment -
          • add diagram for RedundantEditLogInputStream
          • check an additional precondition for RedundantEditLogInputStream: all streams must start on the same txid
          • fix typos
          • rebase
          Show
          Colin Patrick McCabe added a comment - add diagram for RedundantEditLogInputStream check an additional precondition for RedundantEditLogInputStream: all streams must start on the same txid fix typos rebase
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530861/HDFS-3049.027.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2586//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2586//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530861/HDFS-3049.027.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2586//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2586//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -
          +              streams[curIdx].getName() + ".  During automatic failover, " +
          +              "we noticed that all of the remaining edit log streams are " +
          +              "shorter than the current one!  The best " + 
          +              "remaining edit log ends at transaction " + 
          +              newLast + ", but we thought we could read up to transaction " +
          

          You didn't address the comment above about "automatic failover" terminology

          There are also still some "WATERMELON" messages in the patch...

          Show
          Todd Lipcon added a comment - + streams[curIdx].getName() + ". During automatic failover, " + + "we noticed that all of the remaining edit log streams are " + + "shorter than the current one! The best " + + "remaining edit log ends at transaction " + + newLast + ", but we thought we could read up to transaction " + You didn't address the comment above about "automatic failover" terminology There are also still some "WATERMELON" messages in the patch...
          Hide
          Colin Patrick McCabe added a comment -

          sorry about that.

          Fixes:

          • remove WATERMELON
          • fix comment about failover to read "edit log failover"
          • add <pre>...</pre> to JavaDoc for RedundantEditLogInputStream (otherwise, JavaDoc might get confused by the greater than and less than symbols, etc.)
          Show
          Colin Patrick McCabe added a comment - sorry about that. Fixes: remove WATERMELON fix comment about failover to read "edit log failover" add <pre>...</pre> to JavaDoc for RedundantEditLogInputStream (otherwise, JavaDoc might get confused by the greater than and less than symbols, etc.)
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12531202/HDFS-3049.028.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2612//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2612//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531202/HDFS-3049.028.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2612//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2612//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Tests were not executed in the last QA report and the +1's are false positive. There were some error in the build.

               [exec] make[2]: *** [target/usr/local/lib/libhdfs.so.0.0.0] Error 1
               [exec] make[1]: *** [CMakeFiles/hdfs.dir/all] Error 2
               [exec] make: *** [all] Error 2
          
          Show
          Tsz Wo Nicholas Sze added a comment - Tests were not executed in the last QA report and the +1's are false positive. There were some error in the build. [exec] make[2]: *** [target/usr/local/lib/libhdfs.so.0.0.0] Error 1 [exec] make[1]: *** [CMakeFiles/hdfs.dir/all] Error 2 [exec] make: *** [all] Error 2
          Hide
          Colin Patrick McCabe added a comment -

          This patch doesn't change libhdfs.so, so I don't think the build failures there are relevant.

          Still, I will resubmit this so that Jenkins can re-run it.

          Show
          Colin Patrick McCabe added a comment - This patch doesn't change libhdfs.so, so I don't think the build failures there are relevant. Still, I will resubmit this so that Jenkins can re-run it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12531281/HDFS-3049.028.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestShortCircuitLocalRead
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2614//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2614//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531281/HDFS-3049.028.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestShortCircuitLocalRead org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2614//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2614//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          let's retest this now that some tests have been fixed in trunk

          Show
          Colin Patrick McCabe added a comment - let's retest this now that some tests have been fixed in trunk
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12531353/HDFS-3049.028.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestDatanodeBlockScanner

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2619//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2619//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531353/HDFS-3049.028.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeBlockScanner +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2619//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2619//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Nicholas: did you want to review this before commit? +1 from me – I'll commit it on Monday if there are no further comments.

          Show
          Todd Lipcon added a comment - Nicholas: did you want to review this before commit? +1 from me – I'll commit it on Monday if there are no further comments.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Took a quick look. The patch seems reasonable.

          Show
          Tsz Wo Nicholas Sze added a comment - Took a quick look. The patch seems reasonable.
          Hide
          Todd Lipcon added a comment -

          Committed to trunk, thanks Colin. Let's let this bake a little while before backporting to branch-2 IMO.

          Show
          Todd Lipcon added a comment - Committed to trunk, thanks Colin. Let's let this bake a little while before backporting to branch-2 IMO.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2417 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2417/)
          HDFS-3049. During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2417 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2417/ ) HDFS-3049 . During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2344 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2344/)
          HDFS-3049. During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2344 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2344/ ) HDFS-3049 . During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2366 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2366/)
          HDFS-3049. During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2366 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2366/ ) HDFS-3049 . During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1074 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1074/)
          HDFS-3049. During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1074 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1074/ ) HDFS-3049 . During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1107 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1107/)
          HDFS-3049. During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1107 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1107/ ) HDFS-3049 . During the normal NN startup process, fall back on a different edit log if we see one that is corrupt. Contributed by Colin Patrick McCabe. (Revision 1349114) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349114 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/RedundantEditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java
          Hide
          Arun C Murthy added a comment -

          Looks like this patch made an incompatible change and broke MR (non-maven) tests e.g. TestMapredGroupMappingServiceRefresh doesn't compile.

          IAC, we should not remove public ctors from Mini clusters since we don't know how these affect downstream consumers.

          I'll revert this for now.

          Show
          Arun C Murthy added a comment - Looks like this patch made an incompatible change and broke MR (non-maven) tests e.g. TestMapredGroupMappingServiceRefresh doesn't compile. IAC, we should not remove public ctors from Mini clusters since we don't know how these affect downstream consumers. I'll revert this for now.
          Hide
          Arun C Murthy added a comment -

          Sigh, too many commits to revert. I'll just fix this...

          OTOH I just don't see why we added that new parameter enableManagedDfsDirsRedundancy to the ctor - that ctor is never used, plus it broke existing clients.

          Show
          Arun C Murthy added a comment - Sigh, too many commits to revert. I'll just fix this... OTOH I just don't see why we added that new parameter enableManagedDfsDirsRedundancy to the ctor - that ctor is never used, plus it broke existing clients.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1098 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1098/)
          HDFS-3614. Revert unused MiniDFSCluster constructor from HDFS-3049. Contributed by Arun Murthy (Revision 1358825)

          Result = FAILURE
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1098 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1098/ ) HDFS-3614 . Revert unused MiniDFSCluster constructor from HDFS-3049 . Contributed by Arun Murthy (Revision 1358825) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2434 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2434/)
          HDFS-3614. Revert unused MiniDFSCluster constructor from HDFS-3049. Contributed by Arun Murthy (Revision 1358825)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2434 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2434/ ) HDFS-3614 . Revert unused MiniDFSCluster constructor from HDFS-3049 . Contributed by Arun Murthy (Revision 1358825) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2502 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2502/)
          HDFS-3614. Revert unused MiniDFSCluster constructor from HDFS-3049. Contributed by Arun Murthy (Revision 1358825)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2502 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2502/ ) HDFS-3614 . Revert unused MiniDFSCluster constructor from HDFS-3049 . Contributed by Arun Murthy (Revision 1358825) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2452 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2452/)
          HDFS-3614. Revert unused MiniDFSCluster constructor from HDFS-3049. Contributed by Arun Murthy (Revision 1358825)

          Result = FAILURE
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2452 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2452/ ) HDFS-3614 . Revert unused MiniDFSCluster constructor from HDFS-3049 . Contributed by Arun Murthy (Revision 1358825) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1131 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1131/)
          HDFS-3614. Revert unused MiniDFSCluster constructor from HDFS-3049. Contributed by Arun Murthy (Revision 1358825)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1131 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1131/ ) HDFS-3614 . Revert unused MiniDFSCluster constructor from HDFS-3049 . Contributed by Arun Murthy (Revision 1358825) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2508 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2508/)
          HDFS-3614. Revert unused MiniDFSCluster constructor from HDFS-3049. Contributed by Arun Murthy (Revision 1358825)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2508 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2508/ ) HDFS-3614 . Revert unused MiniDFSCluster constructor from HDFS-3049 . Contributed by Arun Murthy (Revision 1358825) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2441 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2441/)
          HDFS-3614. Revert unused MiniDFSCluster constructor from HDFS-3049. Contributed by Arun Murthy (Revision 1358825)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2441 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2441/ ) HDFS-3614 . Revert unused MiniDFSCluster constructor from HDFS-3049 . Contributed by Arun Murthy (Revision 1358825) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358825 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Hide
          Colin Patrick McCabe added a comment -

          The build failures in un-mavenized MR tests were handled by Arun in HDFS-3614

          Show
          Colin Patrick McCabe added a comment - The build failures in un-mavenized MR tests were handled by Arun in HDFS-3614
          Hide
          Todd Lipcon added a comment -

          Reopening for commit to branch-2 (this code is needed for QJM support).

          Show
          Todd Lipcon added a comment - Reopening for commit to branch-2 (this code is needed for QJM support).
          Hide
          Todd Lipcon added a comment -

          Attached patch applies to tip of branch-2. I ran the new tests and they passed.

          Colin, mind taking a quick look at this backport when you get a chance? thanks.

          Show
          Todd Lipcon added a comment - Attached patch applies to tip of branch-2. I ran the new tests and they passed. Colin, mind taking a quick look at this backport when you get a chance? thanks.
          Hide
          Aaron T. Myers added a comment -

          +1, the branch-2 patch looks good to me. It's a little bit smaller than the trunk patch was because some of the changes (mostly related to MiniDFSCluster) had already been back-ported to branch-2 in another back-port.

          Colin - if you have a chance, it'd still be good for you to take a look at the branch-2 patch before Todd commits it.

          Show
          Aaron T. Myers added a comment - +1, the branch-2 patch looks good to me. It's a little bit smaller than the trunk patch was because some of the changes (mostly related to MiniDFSCluster) had already been back-ported to branch-2 in another back-port. Colin - if you have a chance, it'd still be good for you to take a look at the branch-2 patch before Todd commits it.
          Hide
          Colin Patrick McCabe added a comment -

          In RedundantEditLogInputStream, you don't actually need these imports:

          > import java.util.Collections;
          23a25,29
          > import java.util.Iterator;
          > import java.util.LinkedList;
          > import java.util.List;
          > 
          > import org.apache.commons.lang.StringUtils;
          

          Aside from that minor detail, looks good to me.

          Show
          Colin Patrick McCabe added a comment - In RedundantEditLogInputStream , you don't actually need these imports: > import java.util.Collections; 23a25,29 > import java.util.Iterator; > import java.util.LinkedList; > import java.util.List; > > import org.apache.commons.lang.StringUtils; Aside from that minor detail, looks good to me.
          Hide
          Todd Lipcon added a comment -

          Fixed the extra imports and committed to branch-2, thanks for the reviews.

          Show
          Todd Lipcon added a comment - Fixed the extra imports and committed to branch-2, thanks for the reviews.

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              Colin Patrick McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development