Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1936

Server exits when unable to create data directory due to race

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.4.6, 3.5.0
    • Fix Version/s: 3.6.0
    • Component/s: server

      Description

      We sometime see issues with ZooKeeper server not starting and seeing this error in the log:

      [2014-05-27 09:29:48.248] ERROR : -
      .org.apache.zookeeper.server.ZooKeeperServerMain Unexpected exception,
      exiting abnormally\nexception=\njava.io.IOException: Unable to create data
      directory /home/y/var/zookeeper/version-2\n\tat
      org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:85)\n\tat
      org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
      org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
      org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
      org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
      org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
      [...]

      Stack trace from JVM gives this:

      "PurgeTask" daemon prio=10 tid=0x000000000201d000 nid=0x1727 runnable
      [0x00007f55d7dc7000]
      java.lang.Thread.State: RUNNABLE
      at java.io.UnixFileSystem.createDirectory(Native Method)
      at java.io.File.mkdir(File.java:1310)
      at java.io.File.mkdirs(File.java:1337)
      at
      org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:84)
      at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
      at
      org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
      at java.util.TimerThread.mainLoop(Timer.java:555)
      at java.util.TimerThread.run(Timer.java:505)

      "zookeeper server" prio=10 tid=0x00000000027df800 nid=0x1715 runnable
      [0x00007f55d7ed8000]
      java.lang.Thread.State: RUNNABLE
      at java.io.UnixFileSystem.createDirectory(Native Method)
      at java.io.File.mkdir(File.java:1310)
      at java.io.File.mkdirs(File.java:1337)
      at
      org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:84)
      at
      org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
      at
      org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
      at
      org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
      at
      org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
      at
      org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
      [...]

      So it seems that when autopurge is used (as it is in our case), it might happen at the same time as starting the server itself. In FileTxnSnapLog() it will check if the directory exists and create it if not. These two tasks do this at the same time, and mkdir fails and server exits the JVM.

        Attachments

        1. ZOOKEEPER-1936.v5.patch
          2 kB
          Ted Yu
        2. ZOOKEEPER-1936.v4.patch
          2 kB
          Ted Yu
        3. ZOOKEEPER-1936.v3.patch
          1 kB
          Ted Yu
        4. ZOOKEEPER-1936.v3.patch
          1 kB
          Ted Yu
        5. ZOOKEEPER-1936.v2.patch
          0.9 kB
          Ted Yu
        6. ZOOKEEPER-1936.patch
          1 kB
          Andrew Kyle Purtell
        7. ZOOKEEPER-1936.branch-3.4.patch
          1 kB
          Ted Yu

          Activity

            People

            • Assignee:
              yuzhihong@gmail.com Ted Yu
              Reporter:
              hmusum Harald Musum
            • Votes:
              3 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m