Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1621

ZooKeeper does not recover from crash when disk was full

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.4.3
    • None
    • server
    • Ubuntu 12.04, Amazon EC2 instance

    Description

      The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception

      2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
      java.io.IOException: No space left on device
      at java.io.FileOutputStream.writeBytes(Native Method)
      at java.io.FileOutputStream.write(FileOutputStream.java:282)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
      at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
      at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
      at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
      at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
      at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)

      Then many subsequent exceptions like:

      2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
      2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
      java.io.EOFException
      at java.io.DataInputStream.readInt(DataInputStream.java:375)
      at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
      at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
      at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
      at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
      at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
      at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
      at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
      at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
      at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
      at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
      at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
      at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

      It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?

      Attachments

        1. ZOOKEEPER-1621.patch
          8 kB
          Michi Mutsuzaki
        2. ZOOKEEPER-1621.2.patch
          9 kB
          Abhishek Rai
        3. zookeeper.log.gz
          129 kB
          David Arthur

        Issue Links

          Activity

            People

              michim Michi Mutsuzaki
              mumrah David Arthur
              Votes:
              7 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h