Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-3848

Zookeeper upgrade fails due to missing snapshots on branch-3.6

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.6.2
    • None
    • server
    • None

    Description

      We tested upgrading a single-node zookeeper from branch-3.4/branch-3.5 to branch-3.6, but the upgraded node failed to start.

      The error message is shown as following:

      2020-05-24 00:24:24,996 [myid:1] - ERROR [main:ZooKeeperServerMain@90] - Unexpected exception, exiting abnormally
      java.io.IOException: No snapshot found, but there are log entries. Something is broken!
              at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:281)
              at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)
              at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:484)
              at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:655)
              at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:758)
              at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:130)
              at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:159)
              at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:112)
              at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:67)
              at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:140)
              at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
      2020-05-24 00:24:24,999 [myid:1] - INFO  [main:ZKAuditProvider@42] - ZooKeeper audit is disabled.
      2020-05-24 00:24:25,001 [myid:1] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1 

      The error can be reproduced through the following steps:

      1. Step1: Start a single-node zookeeper (compiled from either branch-3.4 or branch-3.5) with the following configuration(zoo.cfg):
      tickTime=2000
      initLimit=10
      syncLimit=5
      dataDir=/tmp/zookeeper
      clientPort=2181
      server.1=localhost:2888:3888
      1. Step2: Use a zookeeper stress testing tool - zk-smoketool (https://github.com/phunt/zk-smoketest.git) - to test this node. We invoked create, set, and get operations in zk-smoketool but not delete operation, so that generated data are left on disk.
      2. Step3: Upgrade the node to branch-3.6 with the same configuration. After upgraded, as the log suggested, zookeeper failed to start.

      We learned about ZOOKEEPER-3056 and ZOOKEEPER-3513, and added

      zookeeper.snapshot.trust.empty=true 

      to branch-3.6's configuration(zoo.cfg), but it ran into the same failure.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Zhuqi1108 Zhuqi Jin

            Dates

              Created:
              Updated:

              Slack

                Issue deployment