ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-663

hudson failure in ZKDatabaseCorruptionTest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/686/

      java.lang.RuntimeException: Unable to run quorum server
      at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:380)
      at org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:99)
      Caused by: java.io.IOException: Invalid magic number 0 != 1514884167
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:455)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:471)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:438)
      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:519)
      at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:145)
      at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:193)
      at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:377)

        Activity

        Patrick Hunt created issue -
        Hide
        Mahadev konar added a comment -

        This looks like a quorum peer was creting a new txn log file and was shutdown in the middle of that. This probably led to corruption of txnlogs in the data directory of one of the quorumpeers. We actually do not have a good story with the corruption with of the transaction logs. Currently we depend on admins manually going to the node and making decisions on how to resolve this.

        As a part of this jira we can add documentation in the forrest docs for now, on how to deal with such situations. Also, the logging needs to change to point which file was corrupted.

        Show
        Mahadev konar added a comment - This looks like a quorum peer was creting a new txn log file and was shutdown in the middle of that. This probably led to corruption of txnlogs in the data directory of one of the quorumpeers. We actually do not have a good story with the corruption with of the transaction logs. Currently we depend on admins manually going to the node and making decisions on how to resolve this. As a part of this jira we can add documentation in the forrest docs for now, on how to deal with such situations. Also, the logging needs to change to point which file was corrupted.
        Hide
        Mahadev konar added a comment -

        this patch fixes the logging to mention which file is corrupted and then adds forrest docs on handling such kind of failures.

        Show
        Mahadev konar added a comment - this patch fixes the logging to mention which file is corrupted and then adds forrest docs on handling such kind of failures.
        Mahadev konar made changes -
        Field Original Value New Value
        Attachment ZOOKEEPER-663.patch [ 12438038 ]
        Mahadev konar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12438038/ZOOKEEPER-663.patch
        against trunk revision 919640.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438038/ZOOKEEPER-663.patch against trunk revision 919640. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/console This message is automatically generated.
        Hide
        Benjamin Reed added a comment -

        no need for a test. it just changes messages and doc.

        Show
        Benjamin Reed added a comment - no need for a test. it just changes messages and doc.
        Benjamin Reed made changes -
        Hadoop Flags [Reviewed]
        Hide
        Henry Robinson added a comment -

        I just committed this. Thanks Mahadev!

        Show
        Henry Robinson added a comment - I just committed this. Thanks Mahadev!
        Henry Robinson made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Patrick Hunt made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        31d 4h 3m 1 Mahadev konar 05/Mar/10 21:34
        Patch Available Patch Available Resolved Resolved
        3d 3h 51m 1 Henry Robinson 09/Mar/10 01:25
        Resolved Resolved Closed Closed
        17d 16h 1 Patrick Hunt 26/Mar/10 17:25

          People

          • Assignee:
            Mahadev konar
            Reporter:
            Patrick Hunt
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development