ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-609

ObserverTest failure "zk should not be connected expected not same"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.0
    • Component/s: quorum, server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      ObserverTest failed running on 8core

      I ran the test as:

      ant -Dtest.junit.output.format=xml -Dtest.output -Dtestcase=AsyncHammerTest clean test-core-java &> test.out

        Activity

        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #632 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/632/)

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #632 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/632/ )
        Hide
        Patrick Hunt added a comment -

        +1 with ZOOKEEPER-630 the tests now pass for me. Thanks Henry

        Show
        Patrick Hunt added a comment - +1 with ZOOKEEPER-630 the tests now pass for me. Thanks Henry
        Hide
        Henry Robinson added a comment -

        Patrick -

        See https://issues.apache.org/jira/browse/ZOOKEEPER-630 - there are duplicate ObserverTest.java files in trunk. This failure is in one removed by that patch.

        Can you try applying 630, then this patch, then trying?

        Thanks,

        Henry

        Show
        Henry Robinson added a comment - Patrick - See https://issues.apache.org/jira/browse/ZOOKEEPER-630 - there are duplicate ObserverTest.java files in trunk. This failure is in one removed by that patch. Can you try applying 630, then this patch, then trying? Thanks, Henry
        Hide
        Patrick Hunt added a comment -

        Btw, failure was on jvm 1.6.0_17 as run by:

        ant -Dtestcase=ObserverTest clean test-core-java

        this is 8core linux (same as this: http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview#Hardware)

        Show
        Patrick Hunt added a comment - Btw, failure was on jvm 1.6.0_17 as run by: ant -Dtestcase=ObserverTest clean test-core-java this is 8core linux (same as this: http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview#Hardware )
        Hide
        Patrick Hunt added a comment -

        I re-ran the test with the provide patch and now see an NPE:

        Testcase: testObserver took 30.167 sec
        FAILED
        waiting for server 1 being up
        junit.framework.AssertionFailedError: waiting for server 1 being up
        at org.apache.zookeeper.server.quorum.ObserverTest.testObserver(ObserverTest.java:87)

        Testcase: testSingleObserver took 30.109 sec
        Caused an ERROR
        null
        java.lang.NullPointerException
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.shutdown(QuorumPeerMain.java:147)
        at org.apache.zookeeper.server.quorum.QuorumPeerMainTest$TestQPMain.shutdown(QuorumPeerMainTest.java:62)
        at org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:97)
        at org.apache.zookeeper.server.quorum.ObserverTest.testSingleObserver(ObserverTest.java:189)

        Testcase: testLeaderElectionFail took 0.002 sec

        Show
        Patrick Hunt added a comment - I re-ran the test with the provide patch and now see an NPE: Testcase: testObserver took 30.167 sec FAILED waiting for server 1 being up junit.framework.AssertionFailedError: waiting for server 1 being up at org.apache.zookeeper.server.quorum.ObserverTest.testObserver(ObserverTest.java:87) Testcase: testSingleObserver took 30.109 sec Caused an ERROR null java.lang.NullPointerException at org.apache.zookeeper.server.quorum.QuorumPeerMain.shutdown(QuorumPeerMain.java:147) at org.apache.zookeeper.server.quorum.QuorumPeerMainTest$TestQPMain.shutdown(QuorumPeerMainTest.java:62) at org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:97) at org.apache.zookeeper.server.quorum.ObserverTest.testSingleObserver(ObserverTest.java:189) Testcase: testLeaderElectionFail took 0.002 sec
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428141/ZOOKEEPER-609.patch
        against trunk revision 891034.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/27/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/27/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/27/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428141/ZOOKEEPER-609.patch against trunk revision 891034. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/27/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/27/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/27/console This message is automatically generated.
        Hide
        Henry Robinson added a comment -

        Found an issue: notification of an event is done before the event is saved, so there's a possibility that the main thread will get woken up and see the old event rather than the new one. This patch fixes.

        I can't reproduce the issue on either of my test machines, so we will have to rely on Hudson.

        Show
        Henry Robinson added a comment - Found an issue: notification of an event is done before the event is saved, so there's a possibility that the main thread will get woken up and see the old event rather than the new one. This patch fixes. I can't reproduce the issue on either of my test machines, so we will have to rely on Hudson.
        Hide
        Patrick Hunt added a comment -
        Show
        Patrick Hunt added a comment - latest failure on hudson is this issue: http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/606/
        Hide
        Henry Robinson added a comment -

        The error is due to the fact that the client should stop being connected to a cluster when it is not quorate (the test mimics the failure of one follower in a 2-follower, 1-observer cluster). This error message is printed when an event is received that we are expecting to be the disconnect event, so KeeperState should != SyncConnected.

        It is possible that there's an extra message being received before the disconnect event. I'm having difficulties recreating the failure with extra logging dialled in, unfortunately - I'll keep plugging away.

        Show
        Henry Robinson added a comment - The error is due to the fact that the client should stop being connected to a cluster when it is not quorate (the test mimics the failure of one follower in a 2-follower, 1-observer cluster). This error message is printed when an event is received that we are expecting to be the disconnect event, so KeeperState should != SyncConnected. It is possible that there's an extra message being received before the disconnect event. I'm having difficulties recreating the failure with extra logging dialled in, unfortunately - I'll keep plugging away.
        Hide
        Patrick Hunt added a comment -

        what does this error message mean btw? might be good to also update the text to be more sensical.

        Show
        Patrick Hunt added a comment - what does this error message mean btw? might be good to also update the text to be more sensical.

          People

          • Assignee:
            Henry Robinson
            Reporter:
            Patrick Hunt
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development