Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1653

zookeeper fails to start because of inconsistent epoch

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.4.5
    • Fix Version/s: 3.4.6
    • Component/s: quorum
    • Labels:
      None
    • Release Note:
      ZOOKEEPER-1549.patch should fix this issue in 3.5 branch.

      Description

      It looks like QuorumPeer.loadDataBase() could fail if the server was restarted after zk.takeSnapshot() but before finishing self.setCurrentEpoch(newEpoch) in Learner.java.

      case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
          zk.takeSnapshot();
          self.setCurrentEpoch(newEpoch); // <<< got restarted here
          snapshotTaken = true;
          writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
          break;
      

      The server fails to start because currentEpoch is still 1 but the last processed zkid from the snapshot has been updated.

      2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on disk
      java.io.IOException: The current epoch, 1, is older than the last zxid, 8589934592
              at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
              at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
              ...
      
      $ find datadir                                     
      datadir
      datadir/version-2
      datadir/version-2/currentEpoch.tmp
      datadir/version-2/acceptedEpoch
      datadir/version-2/snapshot.0
      datadir/version-2/currentEpoch
      datadir/version-2/snapshot.200000000
      
      $ cat datadir/version-2/currentEpoch.tmp
      2%
      $ cat datadir/version-2/acceptedEpoch
      2%
      $ cat datadir/version-2/currentEpoch
      1%
      
      1. ZOOKEEPER-1653.3.4.patch
        12 kB
        Michi Mutsuzaki
      2. ZOOKEEPER-1653.3.4.patch
        12 kB
        Michi Mutsuzaki
      3. ZOOKEEPER-1653.3.4.patch
        12 kB
        Michi Mutsuzaki
      4. ZOOKEEPER-1653.patch
        12 kB
        Michi Mutsuzaki
      5. ZOOKEEPER-1653.patch
        12 kB
        Michi Mutsuzaki

        Issue Links

          Activity

          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12575101/ZOOKEEPER-1653.patch
          against trunk revision 1458648.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1436//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1436//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1436//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575101/ZOOKEEPER-1653.patch against trunk revision 1458648. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1436//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1436//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1436//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          Addressed findbugs warnings.

          Show
          michim Michi Mutsuzaki added a comment - Addressed findbugs warnings.
          Hide
          michim Michi Mutsuzaki added a comment -

          Patch for 3.4 branch.

          Show
          michim Michi Mutsuzaki added a comment - Patch for 3.4 branch.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12575124/ZOOKEEPER-1653-3.4.patch
          against trunk revision 1458648.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1438//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575124/ZOOKEEPER-1653-3.4.patch against trunk revision 1458648. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1438//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          How do I tell the buildbot which branch a certain patch is for?

          --Michi

          Show
          michim Michi Mutsuzaki added a comment - How do I tell the buildbot which branch a certain patch is for? --Michi
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12575121/ZOOKEEPER-1653.patch
          against trunk revision 1458648.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1437//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1437//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1437//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575121/ZOOKEEPER-1653.patch against trunk revision 1458648. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1437//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1437//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1437//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          I shouldn't have deleted the updatingEpoch file in the finally block. It might get executed even when the thread gets interrupted in the try block before creating the currentEpoch file.

          --Michi

          Show
          michim Michi Mutsuzaki added a comment - I shouldn't have deleted the updatingEpoch file in the finally block. It might get executed even when the thread gets interrupted in the try block before creating the currentEpoch file. --Michi
          Hide
          fpj Flavio Junqueira added a comment -

          Hi Michi, This issue actually escaped my attention, I'm sorry. I think this one is actually superseded by ZOOKEEPER-1549. Do you agree?

          Show
          fpj Flavio Junqueira added a comment - Hi Michi, This issue actually escaped my attention, I'm sorry. I think this one is actually superseded by ZOOKEEPER-1549 . Do you agree?
          Hide
          michim Michi Mutsuzaki added a comment -

          Yes, I think ZOOKEEPER-1549-3.4.patch will fix this issue.

          Show
          michim Michi Mutsuzaki added a comment - Yes, I think ZOOKEEPER-1549 -3.4.patch will fix this issue.
          Hide
          fpj Flavio Junqueira added a comment -

          Should this jira be a blocker for 3.4.6? Michi Mutsuzaki?

          Show
          fpj Flavio Junqueira added a comment - Should this jira be a blocker for 3.4.6? Michi Mutsuzaki ?
          Hide
          michim Michi Mutsuzaki added a comment -

          Yeah if ZOOKEEPER-1549 is not fixed in 3.4.6, we should fix this in 3.4.6. I'll rebase and resubmit the patch.

          Show
          michim Michi Mutsuzaki added a comment - Yeah if ZOOKEEPER-1549 is not fixed in 3.4.6, we should fix this in 3.4.6. I'll rebase and resubmit the patch.
          Hide
          michim Michi Mutsuzaki added a comment -

          This patch is for 3.4 branch.

          Show
          michim Michi Mutsuzaki added a comment - This patch is for 3.4 branch.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613962/ZOOKEEPER-1653.3.4.patch
          against trunk revision 1541810.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1768//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613962/ZOOKEEPER-1653.3.4.patch against trunk revision 1541810. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1768//console This message is automatically generated.
          Hide
          fpj Flavio Junqueira added a comment -

          If we can't do any of the operations related to the updating file, then we shouldn't keep going, right? Say we fail to create the fail and the server keeps executing. In this case we can fall into the same problem we are discussing here. I think we should either throw an exception or exit the server.

          Show
          fpj Flavio Junqueira added a comment - If we can't do any of the operations related to the updating file, then we shouldn't keep going, right? Say we fail to create the fail and the server keeps executing. In this case we can fall into the same problem we are discussing here. I think we should either throw an exception or exit the server.
          Hide
          michim Michi Mutsuzaki added a comment -

          Ok that sounds reasonable. I'll change the patch so that it throws an IOException if any of the operations on the updating file fails.

          Show
          michim Michi Mutsuzaki added a comment - Ok that sounds reasonable. I'll change the patch so that it throws an IOException if any of the operations on the updating file fails.
          Hide
          michim Michi Mutsuzaki added a comment -

          Throw an IOException if any of the operations on the updating file fails.

          Show
          michim Michi Mutsuzaki added a comment - Throw an IOException if any of the operations on the updating file fails.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12614162/ZOOKEEPER-1653.3.4.patch
          against trunk revision 1542355.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1773//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614162/ZOOKEEPER-1653.3.4.patch against trunk revision 1542355. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1773//console This message is automatically generated.
          Hide
          rgs Raul Gutierrez Segales added a comment -

          Nit in:

          +    static void writeLongToFile(File file, long value) throws IOException {
          +        AtomicFileOutputStream out = new AtomicFileOutputStream(file);
          +        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(out));
          +        boolean aborted = false;
          +        try {
          +            bw.write(Long.toString(value));
          +            bw.flush();
          +            out.flush();
          +            out.close();
          +        } catch (IOException e) {
          +            LOG.error("Failed to write new file " + file, e);
          +            out.abort();
          +            throw e;
          +        }
          +    }
          

          aborted is not used.

          Nit in:

          +            LOG.info("Validating current epoch: " + servers.mt[i].dataDir);
          

          use {} instead of concatenating.

          Nit:

          +        // Shut down the cluster
          

          should be "Shutdown the cluster".

          In src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerTestBase.java:

          +        CountDownLatch mainFailed;
          

          is assigned and modified but never asserted or checked?

          Show
          rgs Raul Gutierrez Segales added a comment - Nit in: + static void writeLongToFile(File file, long value) throws IOException { + AtomicFileOutputStream out = new AtomicFileOutputStream(file); + BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(out)); + boolean aborted = false; + try { + bw.write(Long.toString(value)); + bw.flush(); + out.flush(); + out.close(); + } catch (IOException e) { + LOG.error("Failed to write new file " + file, e); + out.abort(); + throw e; + } + } aborted is not used. Nit in: + LOG.info("Validating current epoch: " + servers.mt[i].dataDir); use {} instead of concatenating. Nit: + // Shut down the cluster should be "Shutdown the cluster". In src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerTestBase.java: + CountDownLatch mainFailed; is assigned and modified but never asserted or checked?
          Hide
          rgs Raul Gutierrez Segales added a comment -

          I take back the last comment, I carelessly overlooked the inheriting class.

          Show
          rgs Raul Gutierrez Segales added a comment - I take back the last comment, I carelessly overlooked the inheriting class.
          Hide
          rgs Raul Gutierrez Segales added a comment -

          One more nit in src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java:

          +        currentEpochFile = new File(new File(follower.dataDir, "version-2"),
          +                                    "currentEpoch");
          ....
          +        File updatingEpochFile = new File(
          +                new File(follower.dataDir, "version-2"),
          +                QuorumPeer.UPDATING_EPOCH_FILENAME);
          

          could be abbreviated with:

          +        File followerDataDir = new File(follower.dataDir, "version-2");
          +        currentEpochFile = new File(followerDataDir, "currentEpoch");
          ....
          +        File updatingEpochFile = new File(followerDataDir, QuorumPeer.UPDATING_EPOCH_FILENAME);
          

          Also - should there be a constant for currentEpoch too?

          Show
          rgs Raul Gutierrez Segales added a comment - One more nit in src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java: + currentEpochFile = new File(new File(follower.dataDir, "version-2"), + "currentEpoch"); .... + File updatingEpochFile = new File( + new File(follower.dataDir, "version-2"), + QuorumPeer.UPDATING_EPOCH_FILENAME); could be abbreviated with: + File followerDataDir = new File(follower.dataDir, "version-2"); + currentEpochFile = new File(followerDataDir, "currentEpoch"); .... + File updatingEpochFile = new File(followerDataDir, QuorumPeer.UPDATING_EPOCH_FILENAME); Also - should there be a constant for currentEpoch too?
          Hide
          vinayrpet Vinayakumar B added a comment -

          Apart from Raul Gutierrez Segales comments which are minor nits in Test,

          +1 for changes in src.

          Show
          vinayrpet Vinayakumar B added a comment - Apart from Raul Gutierrez Segales comments which are minor nits in Test, +1 for changes in src.
          Hide
          fpj Flavio Junqueira added a comment -

          Canceling patch so that comments can be addressed.

          Show
          fpj Flavio Junqueira added a comment - Canceling patch so that comments can be addressed.
          Hide
          rakeshr Rakesh R added a comment -

          Good work Michi Mutsuzaki. I've just one comment.

          In the patch, updating.createNewFile() api will return false and throws IOException, when the QuorumPeer.UPDATING_EPOCH_FILENAME file already exists. One possible case could be, say previously the system crashed before deleting the file.

          I think we should improve the check as follows:

          if (!updating.exists() && !updating.createNewFile()) {                 
          
          Show
          rakeshr Rakesh R added a comment - Good work Michi Mutsuzaki . I've just one comment. In the patch, updating.createNewFile() api will return false and throws IOException, when the QuorumPeer.UPDATING_EPOCH_FILENAME file already exists. One possible case could be, say previously the system crashed before deleting the file. I think we should improve the check as follows: if (!updating.exists() && !updating.createNewFile()) {
          Hide
          vinayrpet Vinayakumar B added a comment -

          Thanks rakesh for joining.
          UpdatingEpoch file will be deleted when the snapshot loaded during startup, and at that time currentEpoch will also be updated before deletion.

          So by the time of next snapshot this file should not be there in any case.

          Show
          vinayrpet Vinayakumar B added a comment - Thanks rakesh for joining. UpdatingEpoch file will be deleted when the snapshot loaded during startup, and at that time currentEpoch will also be updated before deletion. So by the time of next snapshot this file should not be there in any case.
          Hide
          rakeshr Rakesh R added a comment -

          Hi vinay,
          In the patch I could see a conditional deletion of 'UpdatingEpoch' file. Here it will delete this file only when the following condition satifies. I was thinking about a case, where the condition is not satified, now the file will not be deleted and will affect the flow which I mentioned earlier. IMHO, it is safe to have 'updating.exists()' check.

          +                if (epochOfZxid > currentEpoch && updating.exists()) {
                                 //........
          +                    if (!updating.delete()) {
          +                        throw new IOException("Failed to delete " +
          +                                              updating.toString());
          +                    }
          +                }
          
          Show
          rakeshr Rakesh R added a comment - Hi vinay, In the patch I could see a conditional deletion of 'UpdatingEpoch' file. Here it will delete this file only when the following condition satifies. I was thinking about a case, where the condition is not satified, now the file will not be deleted and will affect the flow which I mentioned earlier. IMHO, it is safe to have 'updating.exists()' check. + if (epochOfZxid > currentEpoch && updating.exists()) { //........ + if (!updating.delete()) { + throw new IOException( "Failed to delete " + + updating.toString()); + } + }
          Hide
          vinayrpet Vinayakumar B added a comment -

          Oh!! Yes. You are correct I missed it. If server crashes just after self.setCurrentEpoch(newEpoch); then that is very much possible.

          Show
          vinayrpet Vinayakumar B added a comment - Oh!! Yes. You are correct I missed it. If server crashes just after self.setCurrentEpoch(newEpoch); then that is very much possible.
          Hide
          michim Michi Mutsuzaki added a comment -

          Thank you guys for all the feedback. I addressed all the comments except for using variable substitution for logging in QuorumPeerMainTest. QuorumPeerMainTest uses log4j.WriterAppender, and it hasn't migrated to slf4j, so I'm leaving the logging to use string concatenation for now.

          Show
          michim Michi Mutsuzaki added a comment - Thank you guys for all the feedback. I addressed all the comments except for using variable substitution for logging in QuorumPeerMainTest. QuorumPeerMainTest uses log4j.WriterAppender, and it hasn't migrated to slf4j, so I'm leaving the logging to use string concatenation for now.
          Hide
          vinayrpet Vinayakumar B added a comment -

          +1, Latest patch looks good. Thanks Michi.

          Show
          vinayrpet Vinayakumar B added a comment - +1, Latest patch looks good. Thanks Michi.
          Hide
          fpj Flavio Junqueira added a comment -

          +1, good job. Michi Mutsuzaki! Committed revision 1545883.

          Show
          fpj Flavio Junqueira added a comment - +1, good job. Michi Mutsuzaki ! Committed revision 1545883.
          Hide
          fpj Flavio Junqueira added a comment -

          Closing issues after releasing 3.4.6.

          Show
          fpj Flavio Junqueira added a comment - Closing issues after releasing 3.4.6.
          Hide
          rakeshr Rakesh R added a comment -

          It seems this issue is occurring in 3.5 branch also. I could see the chance of not updating the currentEpoch file after the successful zk.takeSnapshot() logic. Please see Learner.java.

          In release note its mentioned ZOOKEEPER-1549 task will be used to address this case in branch 3.5. IIUC, that issue is focussing to solve the differences between the ZooKeeper servers, but this is a situation occurred within a server, am I missing anything?

          IMHO, we can re-open the issue and merge the fix to 3.5 branch

          Show
          rakeshr Rakesh R added a comment - It seems this issue is occurring in 3.5 branch also. I could see the chance of not updating the currentEpoch file after the successful zk.takeSnapshot() logic. Please see Learner.java . In release note its mentioned ZOOKEEPER-1549 task will be used to address this case in branch 3.5. IIUC, that issue is focussing to solve the differences between the ZooKeeper servers, but this is a situation occurred within a server, am I missing anything? IMHO, we can re-open the issue and merge the fix to 3.5 branch
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Rakesh R I think, you are correct. can we reopen this issue ZK-3.5 branch..? we had faced same issue in 3.5 branch.

          Show
          brahmareddy Brahma Reddy Battula added a comment - Rakesh R I think, you are correct. can we reopen this issue ZK-3.5 branch..? we had faced same issue in 3.5 branch.
          Hide
          arshad.mohammad Mohammad Arshad added a comment -

          Created new jira ZOOKEEPER-2354 for merging this important fix to master and 3.5 branch.

          Show
          arshad.mohammad Mohammad Arshad added a comment - Created new jira ZOOKEEPER-2354 for merging this important fix to master and 3.5 branch.

            People

            • Assignee:
              michim Michi Mutsuzaki
              Reporter:
              michim Michi Mutsuzaki
            • Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development