Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2560

Possible Cluster Unavailability

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 3.4.8
    • server
    • None
    • Three node linux cluster

    Description

      Possible Cluster Unvailability

      I am running a three node ZooKeeper cluster. Each node runs Linux.

      I see the below sequence of system calls when ZooKeeper appends a user data item to the log file.

      1 write("/data/version-2/log.200000001", offset=65, count=12)
      2 write("/data/version-2/log.200000001", offset=77, count=16323)
      3 write("/data/version-2/log.200000001", offset=16400, count=4209)
      4 write("/data/version-2/log.200000001", offset=20609, count=1)
      5 fdatasync("/data//version-2/log.200000001")

      Now, a crash could happen just after operation 4 but before the final fdatasync. In this situation, the file system could persist the 4th operation and fail to persist the 3rd operation because of the crash and there is fsync in between them. In such cases, ZooKeeper server fails to start with the following messages in its log file:

      [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /tmp/zoo2.cfg
      [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.2 to address: /127.0.0.2
      [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.4 to address: /127.0.0.4
      [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.3 to address: /127.0.0.3
      [myid:] - INFO [main:QuorumPeerConfig@331] - Defaulting to majority quorums
      [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
      [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
      [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
      [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
      [myid:1] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2182
      [myid:1] - INFO [main:QuorumPeer@1019] - tickTime set to 2000
      [myid:1] - INFO [main:QuorumPeer@1039] - minSessionTimeout set to -1
      [myid:1] - INFO [main:QuorumPeer@1050] - maxSessionTimeout set to -1
      [myid:1] - INFO [main:QuorumPeer@1065] - initLimit set to 5
      [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /data/version-2/snapshot.100000002
      [myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk
      java.io.IOException: CRC check failed
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635)
      at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
      at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
      at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
      at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
      2016-04-15 04:00:32,795 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
      java.lang.RuntimeException: Unable to run quorum server
      at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558)
      at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
      Caused by: java.io.IOException: CRC check failed
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635)
      at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
      at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
      at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
      ... 4 more

      The same happens when the 3rd and 4th writes hit the disk but the 2nd operation does not.

      Now, two nodes of a three node cluster can easily reach this state, rendering the entire cluster unavailable. ZooKeeper, on recovery should be able to handle such checksum mismatches gracefully to maintain cluster availability.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ramanala Ramnatthan Alagappan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: