Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2562

Safely persist renames of *epoch.tmp to *epoch by issuing fsync on parent directory -- Possible cluster unavailability otherwise

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.4.7
    • Component/s: None
    • Labels:
      None
    • Environment:

      Three node linux cluster

      Description

      I am running a three node ZooKeeper cluster.

      Renames of acceptedEpoch.tmp to acceptedEpoch and currentEpoch.tmp to currentEpoch have to persisted to disk by explicitly issuing fsync on the parent directory. If not, the rename might not hit the disk immediately and if a crash occurs at this point, then the server would fail to start with the following error in the log. If this happens on two more or nodes, then the cluster can become unavailable.

      [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /tmp/zoo2.cfg
      [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.2 to address: /127.0.0.2
      [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.4 to address: /127.0.0.4
      [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.3 to address: /127.0.0.3
      [myid:] - INFO [main:QuorumPeerConfig@331] - Defaulting to majority quorums
      [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
      [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
      [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
      [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
      [myid:1] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2182
      [myid:1] - INFO [main:QuorumPeer@1019] - tickTime set to 2000
      [myid:1] - INFO [main:QuorumPeer@1039] - minSessionTimeout set to -1
      [myid:1] - INFO [main:QuorumPeer@1050] - maxSessionTimeout set to -1
      [myid:1] - INFO [main:QuorumPeer@1065] - initLimit set to 5
      [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /run/shm/dice-4636/113-98-129-z_majority_RO_OM_0=60_1=55/rdir-0/version-2/snapshot.100000002
      [myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk
      java.io.IOException: The accepted epoch, 1 is less than the current epoch, 2
      at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554)
      at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
      2016-04-15 03:24:57,144 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
      java.lang.RuntimeException: Unable to run quorum server
      at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558)
      at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
      Caused by: java.io.IOException: The accepted epoch, 1 is less than the current epoch, 2
      at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554)
      ... 4 more

      Similarly, when new log file is created, the parent directory needs be explicitly fsynced to persist the log file. Otherwise a data loss might be possible (We have reproduced the above issues). Please see this: https://www.quora.com/Linux/When-should-you-fsync-the-containing-directory-in-addition-to-the-file-itself and http://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdf.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ramanala Ramnatthan Alagappan
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: