Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: tests
    • Labels:
      None

      Description

      Some of the tests are consistently failing for me and intermittently on hudson.

      Posting discussion from mailing list below.

      Vishal,
      Can you please open a jira for this and mark it as a blocker for 3.4
      release? Looks like its transient:

      https://builds.apache.org/job/ZooKeeper-trunk/

      The latest build is passing.

      thanks
      mahadev

      • Hide quoted text -

      On Mon, Jul 11, 2011 at 12:49 PM, Vishal Kher <vishalmlst@gmail.com> wrote:
      > Hi,
      >
      > ant test-core-java is consistently failing for me.
      >
      > The error seems to be either:
      >
      > Testcase: testFollowersStartAfterLeader took 35.577 sec
      > Caused an ERROR
      > Did not connect
      > java.util.concurrent.TimeoutException: Did not connect
      > at
      > org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:124)
      > at
      > org.apache.zookeeper.test.QuorumTest.testFollowersStartAfterLeader(QuorumTest.java:308)
      > at
      > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
      >
      > or
      >
      > Testcase: testNoLogBeforeLeaderEstablishment took 8.831 sec
      > Caused an ERROR
      > KeeperErrorCode = ConnectionLoss for /blah
      > org.apache.zookeeper.KeeperException$ConnectionLossException:
      > KeeperErrorCode = ConnectionLoss for /blah
      > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
      > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761)
      > at
      > org.apache.zookeeper.test.QuorumTest.testNoLogBeforeLeaderEstablishment(QuorumTest.java:385)
      > at
      > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
      >
      > Looks like the reason why the tests are failing for me is similar to why the
      > tests failed on hudson:
      >
      > 2011-07-11 14:47:26,219 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379
      > :Leader@425] - Shutdown called
      > java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1
      > at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:425)
      > at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:400)
      > at
      > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:729)
      > 2011-07-11 14:47:26,220 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379
      > :ZooKeeperServer@416] - shutting down
      >
      > The leader is not able to ping the followers. Has anyone seen this before?
      >
      > Thanks.
      > -Vishal
      >
      > On Sun, Jul 10, 2011 at 6:52 AM, Apache Jenkins Server <
      > jenkins@builds.apache.org> wrote:
      >
      >> See https://builds.apache.org/job/ZooKeeper-trunk/1239/
      >>
      >>
      >> ###################################################################################
      >> ########################## LAST 60 LINES OF THE CONSOLE
      >> ###########################
      >> [...truncated 242795 lines...]
      >> [junit] 2011-07-10 10:57:16,673 [myid:] - INFO
      >> [main:SessionTrackerImpl@206] - Shutting down
      >> [junit] 2011-07-10 10:57:16,673 [myid:] - INFO
      >> [main:PrepRequestProcessor@702] - Shutting down
      >> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO
      >> [main:SyncRequestProcessor@170] - Shutting down
      >> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO
      >> [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited!
      >> [junit] 2011-07-10 10:57:16,675 [myid:] - INFO
      >> [main:FinalRequestProcessor@423] - shutdown of request processor complete
      >> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [ProcessThread(sid:0
      >> cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop!
      >> [junit] 2011-07-10 10:57:16,676 [myid:] - INFO [main:ClientBase@227] -
      >> connecting to 127.0.0.1 11221
      >> [junit] ensureOnly:[]
      >> [junit] 2011-07-10 10:57:16,677 [myid:] - INFO [main:ClientBase@428] -
      >> STARTING server
      >> [junit] 2011-07-10 10:57:16,678 [myid:] - INFO
      >> [main:ZooKeeperServer@164] - Created server with tickTime 3000
      >> minSessionTimeout 6000 maxSessionTimeout 60000 datadir
      >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2
      >> snapdir
      >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2
      >> [junit] 2011-07-10 10:57:16,679 [myid:] - INFO
      >> [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221
      >> [junit] 2011-07-10 10:57:16,680 [myid:] - INFO [main:FileSnap@83] -
      >> Reading snapshot
      >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2/snapshot.b
      >> [junit] 2011-07-10 10:57:16,683 [myid:] - INFO [main:FileTxnSnapLog@256]
      >> - Snapshotting: b
      >> [junit] 2011-07-10 10:57:16,684 [myid:] - INFO [main:ClientBase@227] -
      >> connecting to 127.0.0.1 11221
      >> [junit] 2011-07-10 10:57:16,685 [myid:] - INFO [NIOServerCxn.Factory:
      >> 0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket
      >> connection from /127.0.0.1:45122
      >> [junit] 2011-07-10 10:57:16,686 [myid:] - INFO [NIOServerCxn.Factory:
      >> 0.0.0.0/0.0.0.0:11221:NIOServerCnxn@815] - Processing stat command from /
      >> 127.0.0.1:45122
      >> [junit] 2011-07-10 10:57:16,686 [myid:] - INFO
      >> [Thread-5:NIOServerCnxn$StatCommand@652] - Stat command output
      >> [junit] 2011-07-10 10:57:16,688 [myid:] - INFO
      >> [Thread-5:NIOServerCnxn@995] - Closed socket connection for client /
      >> 127.0.0.1:45122 (no session established for client)
      >> [junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port]
      >> [junit] expect:InMemoryDataTree
      >> [junit] found:InMemoryDataTree
      >> org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
      >> [junit] expect:StandaloneServer_port
      >> [junit] found:StandaloneServer_port
      >> org.apache.ZooKeeperService:name0=StandaloneServer_port-1
      >> [junit] 2011-07-10 10:57:16,690 [myid:] - INFO
      >> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD
      >> testQuota
      >> [junit] 2011-07-10 10:57:16,690 [myid:] - INFO [main:ClientBase@465] -
      >> tearDown starting
      >> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO [main:ZooKeeper@662] -
      >> Session: 0x13113b1aca50000 closed
      >> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO
      >> [main-EventThread:ClientCnxn$EventThread@495] - EventThread shut down
      >> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO [main:ClientBase@435] -
      >> STOPPING server
      >> [junit] 2011-07-10 10:57:16,755 [myid:] - INFO [NIOServerCxn.Factory:
      >> 0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory
      >> exited run method
      >> [junit] 2011-07-10 10:57:16,755 [myid:] - INFO
      >> [main:ZooKeeperServer@416] - shutting down
      >> [junit] 2011-07-10 10:57:16,756 [myid:] - INFO
      >> [main:SessionTrackerImpl@206] - Shutting down
      >> [junit] 2011-07-10 10:57:16,756 [myid:] - INFO
      >> [main:PrepRequestProcessor@702] - Shutting down
      >> [junit] 2011-07-10 10:57:16,757 [myid:] - INFO
      >> [main:SyncRequestProcessor@170] - Shutting down
      >> [junit] 2011-07-10 10:57:16,760 [myid:] - INFO [ProcessThread(sid:0
      >> cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop!
      >> [junit] 2011-07-10 10:57:16,762 [myid:] - INFO
      >> [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited!
      >> [junit] 2011-07-10 10:57:16,762 [myid:] - INFO
      >> [main:FinalRequestProcessor@423] - shutdown of request processor complete
      >> [junit] 2011-07-10 10:57:16,763 [myid:] - INFO [main:ClientBase@227] -
      >> connecting to 127.0.0.1 11221
      >> [junit] ensureOnly:[]
      >> [junit] 2011-07-10 10:57:16,767 [myid:] - INFO [main:ClientBase@493] -
      >> fdcount after test is: 35 at start it was 24
      >> [junit] 2011-07-10 10:57:16,767 [myid:] - INFO [main:ClientBase@495] -
      >> sleeping for 20 secs
      >> [junit] 2011-07-10 10:57:16,768 [myid:] - INFO [main:ZKTestCase$1@60]
      >> - SUCCEEDED testQuota
      >> [junit] 2011-07-10 10:57:16,768 [myid:] - INFO [main:ZKTestCase$1@55]
      >> - FINISHED testQuota
      >> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.691 sec
      >>
      >> BUILD FAILED
      >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build.xml:959:
      >> The following error occurred while executing this line:
      >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build.xml:870:
      >> Tests failed!
      >>
      >> Total time: 19 minutes 0 seconds
      >> [FINDBUGS] Skipping publisher since build result is FAILURE
      >> [WARNINGS] Skipping publisher since build result is FAILURE
      >> Recording fingerprints
      >> Archiving artifacts
      >> Recording test results
      >> Publishing Javadoc
      >> Publishing Clover coverage report...
      >> No Clover report will be published due to a Build Failure
      >> Email was triggered for: Failure
      >> Sending email for trigger: Failure
      >>
      >>
      >>
      >>
      >> ###################################################################################
      >> ############################## FAILED TESTS (if any)
      >> ##############################
      >> 2 tests failed.
      >> REGRESSION: org.apache.zookeeper.test.ObserverTest.testObserver
      >>
      >> Error Message:
      >> KeeperErrorCode = ConnectionLoss for /obstest
      >>
      >> Stack Trace:
      >> org.apache.zookeeper.KeeperException$ConnectionLossException:
      >> KeeperErrorCode = ConnectionLoss for /obstest
      >> at
      >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
      >> at
      >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      >> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761)
      >> at
      >> org.apache.zookeeper.test.ObserverTest.testObserver(ObserverTest.java:101)
      >> at
      >> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
      >>
      >>
      >> REGRESSION: org.apache.zookeeper.test.ReadOnlyModeTest.testSeekForRwServer
      >>
      >> Error Message:
      >> KeeperErrorCode = ConnectionLoss for /test
      >>
      >> Stack Trace:
      >> org.apache.zookeeper.KeeperException$ConnectionLossException:
      >> KeeperErrorCode = ConnectionLoss for /test
      >> at
      >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
      >> at
      >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      >> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761)
      >> at
      >> org.apache.zookeeper.test.ReadOnlyModeTest.testSeekForRwServer(ReadOnlyModeTest.java:213)
      >> at
      >> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)

      1. fail_on_27th_iteration.log.gz
        466 kB
        Eugene Koontz
      2. repeat-script.patch
        0.5 kB
        Eugene Koontz
      3. zk1125.log.gz
        1.28 MB
        Eugene Koontz
      4. ZOOKEEPER-1125.patch
        0.6 kB
        Vishal Kher

        Issue Links

          Activity

          Hide
          Michi Mutsuzaki added a comment -

          We haven't seen these failures for a while now. I'm closing this.

          Show
          Michi Mutsuzaki added a comment - We haven't seen these failures for a while now. I'm closing this.
          Hide
          Patrick Hunt added a comment -

          This still and issue with 3.4/trunk? Or should we close this?

          Show
          Patrick Hunt added a comment - This still and issue with 3.4/trunk? Or should we close this?
          Hide
          Eugen Paraschiv added a comment -

          Running into this exact problem - any update on this track?
          Thanks.
          Eugen.

          Show
          Eugen Paraschiv added a comment - Running into this exact problem - any update on this track? Thanks. Eugen.
          Hide
          Mahadev konar added a comment -

          I am moving this out to 3.5.0, we can keep this open for further investigation. We dont really need to wait on this for 3.4.

          Show
          Mahadev konar added a comment - I am moving this out to 3.5.0, we can keep this open for further investigation. We dont really need to wait on this for 3.4.
          Hide
          Vishal Kher added a comment -

          I am not sure whats going on. I will investigate further.
          Eugene - thanks for trying out the patch.

          Show
          Vishal Kher added a comment - I am not sure whats going on. I will investigate further. Eugene - thanks for trying out the patch.
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #1304 (See https://builds.apache.org/job/ZooKeeper-trunk/1304/)
          ZOOKEEPER-1125. Intermittent java core test failures. (Vishar Kher via mahadev)

          mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1170433
          Files :

          • /zookeeper/trunk/CHANGES.txt
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java
          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #1304 (See https://builds.apache.org/job/ZooKeeper-trunk/1304/ ) ZOOKEEPER-1125 . Intermittent java core test failures. (Vishar Kher via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1170433 Files : /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java
          Hide
          Mahadev konar added a comment -

          Ill leave this open since the issue isnt entirely fixed.

          Show
          Mahadev konar added a comment - Ill leave this open since the issue isnt entirely fixed.
          Hide
          Mahadev konar added a comment -

          Just committed the patch. Thanks Vishal. I am downgrading the jira to a Major one. I dont think we should block the release with a test case failure that happens rarely.

          Show
          Mahadev konar added a comment - Just committed the patch. Thanks Vishal. I am downgrading the jira to a Major one. I dont think we should block the release with a test case failure that happens rarely.
          Hide
          Eugene Koontz added a comment -

          Sorry, the second sentence in the above should read: "This test starts a set of Quorum Peers and then shuts them down, one at a time, and starts replacements for the ones that were shut down."

          Show
          Eugene Koontz added a comment - Sorry, the second sentence in the above should read: "This test starts a set of Quorum Peers and then shuts them down, one at a time, and starts replacements for the ones that were shut down."
          Hide
          Eugene Koontz added a comment -

          Just for some additional detail on my own testing. CnxManagerTest is failing in testWorkerThreads(). This test starts , when the test shuts down members of its set of Quorum Peers, one at a time, and restarts replacements for them. Apparently sometimes, these replacements apparently are not coming up in a timely fashion.

          Show
          Eugene Koontz added a comment - Just for some additional detail on my own testing. CnxManagerTest is failing in testWorkerThreads(). This test starts , when the test shuts down members of its set of Quorum Peers, one at a time, and restarts replacements for them. Apparently sometimes, these replacements apparently are not coming up in a timely fashion.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12494344/fail_on_27th_iteration.log.gz
          against trunk revision 1170365.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/532//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494344/fail_on_27th_iteration.log.gz against trunk revision 1170365. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/532//console This message is automatically generated.
          Hide
          Eugene Koontz added a comment -

          Unfortunately CnxManagerTest failed on the 27th iteration (please see attached log).

          Show
          Eugene Koontz added a comment - Unfortunately CnxManagerTest failed on the 27th iteration (please see attached log).
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12493956/ZOOKEEPER-1125.patch
          against trunk revision 1166970.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/521//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/521//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/521//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12493956/ZOOKEEPER-1125.patch against trunk revision 1166970. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/521//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/521//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/521//console This message is automatically generated.
          Hide
          Vishal Kher added a comment -

          Submitting patch with increased timeout.

          Show
          Vishal Kher added a comment - Submitting patch with increased timeout.
          Hide
          Vishal Kher added a comment -

          I think the test is failing beacuse of:
          a) the test is waiting for only a minute after it restarts a peer to verify the correctess of cnx manager threads on the peer. It should wait a little longer.
          b) the EC2 environment is introducing some additional variant. I think the timer is expiring before it reaches a minute.

          I have attached a patch that will wait for max 4 minutes. The test sleeps for 500 ms before calling the method that checks the number of threads on the peer. For fast peers the test will exit immediately and for slow peers it will wait for max 4 minutes.

          Eugene, can you please see if the patch works for you? Thanks.

          Show
          Vishal Kher added a comment - I think the test is failing beacuse of: a) the test is waiting for only a minute after it restarts a peer to verify the correctess of cnx manager threads on the peer. It should wait a little longer. b) the EC2 environment is introducing some additional variant. I think the timer is expiring before it reaches a minute. I have attached a patch that will wait for max 4 minutes. The test sleeps for 500 ms before calling the method that checks the number of threads on the peer. For fast peers the test will exit immediately and for slow peers it will wait for max 4 minutes. Eugene, can you please see if the patch works for you? Thanks.
          Hide
          Vishal Kher added a comment -

          Thanks, I will look at the logs.

          Show
          Vishal Kher added a comment - Thanks, I will look at the logs.
          Hide
          Eugene Koontz added a comment -

          Also I should post my "ulimit -a":

          [ec2-user@ip-10-167-22-217 ~]$ ulimit -a
          core file size (blocks, -c) 0
          data seg size (kbytes, -d) unlimited
          scheduling priority (-e) 0
          file size (blocks, -f) unlimited
          pending signals (-i) 136932
          max locked memory (kbytes, -l) 64
          max memory size (kbytes, -m) unlimited
          open files (-n) 1000000
          pipe size (512 bytes, -p) 8
          POSIX message queues (bytes, -q) 819200
          real-time priority (-r) 0
          stack size (kbytes, -s) 8192
          cpu time (seconds, -t) unlimited
          max user processes (-u) 1000000
          virtual memory (kbytes, -v) unlimited
          file locks (-x) unlimited

          Show
          Eugene Koontz added a comment - Also I should post my "ulimit -a": [ec2-user@ip-10-167-22-217 ~] $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 136932 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1000000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 1000000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
          Hide
          Eugene Koontz added a comment -

          result of running "src/repeat.sh CnxManagerTest > zk1125.log 2>&1" on an Amazon EC2 m2.xlarge.

          Show
          Eugene Koontz added a comment - result of running "src/repeat.sh CnxManagerTest > zk1125.log 2>&1" on an Amazon EC2 m2.xlarge.
          Hide
          Eugene Koontz added a comment -

          Hi Vishal,
          Yes, I am running on on an EC2 m2.xlarge. I'm still seeing the eventual failure on CnxManagerTest. I've attached the log file as zk1125.txt.gz (it's large because I set the logging level to DEBUG and there there's 43 successful iterations before the failure).

          Show
          Eugene Koontz added a comment - Hi Vishal, Yes, I am running on on an EC2 m2.xlarge. I'm still seeing the eventual failure on CnxManagerTest. I've attached the log file as zk1125.txt.gz (it's large because I set the logging level to DEBUG and there there's 43 successful iterations before the failure).
          Hide
          Vishal Kher added a comment -

          Mahadev,
          all tests passed for me for several runs.

          Euguene, looks like you are having some timing issues in the test. Can you attach test logs? Are you running this on EC2?

          Show
          Vishal Kher added a comment - Mahadev, all tests passed for me for several runs. Euguene, looks like you are having some timing issues in the test. Can you attach test logs? Are you running this on EC2?
          Hide
          Eugene Koontz added a comment -

          Hi Mahadev, Camille and Vishal,
          I'm getting an eventual test failure on trunk@296d940f for CnxManagerTest:

          java.lang.AssertionError: Mon Aug 15 19:14:02 UTC 2011 Incorrect number of Worker threads for sid=0 expected 4 found 2
          [junit] at org.junit.Assert.fail(Assert.java:91)
          [junit] at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:332)
          [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          [junit] at java.lang.reflect.Method.invoke(Method.java:616)
          [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
          [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
          [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
          [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
          [junit] at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
          [junit] at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
          [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
          [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
          [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
          [junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
          [junit] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
          [junit] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
          [junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
          [junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
          [junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
          [junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
          [junit] 2011-08-15 19:14:02,708 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testWorkerThreads

          Show
          Eugene Koontz added a comment - Hi Mahadev, Camille and Vishal, I'm getting an eventual test failure on trunk@296d940f for CnxManagerTest: java.lang.AssertionError: Mon Aug 15 19:14:02 UTC 2011 Incorrect number of Worker threads for sid=0 expected 4 found 2 [junit] at org.junit.Assert.fail(Assert.java:91) [junit] at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:332) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit] at java.lang.reflect.Method.invoke(Method.java:616) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) [junit] at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) [junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) [junit] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) [junit] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) [junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) [junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) [junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236) [junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768) [junit] 2011-08-15 19:14:02,708 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testWorkerThreads
          Hide
          Mahadev konar added a comment -

          Eugene/Camille/Vishal,
          Any updates on the tests?

          Show
          Mahadev konar added a comment - Eugene/Camille/Vishal, Any updates on the tests?
          Hide
          Eugene Koontz added a comment -

          attachment is not a fix for the issue; just for helping diagnose.

          Show
          Eugene Koontz added a comment - attachment is not a fix for the issue; just for helping diagnose.
          Hide
          Eugene Koontz added a comment -

          A script to repeatedly run a single test: usage:

          src/repeat.sh ZooKeeperTest

          Show
          Eugene Koontz added a comment - A script to repeatedly run a single test: usage: src/repeat.sh ZooKeeperTest
          Hide
          Eugene Koontz added a comment -

          I am going to help with this. Feeling the core test failure angst.

          Show
          Eugene Koontz added a comment - I am going to help with this. Feeling the core test failure angst.
          Hide
          Camille Fournier added a comment -

          Actually an update that this is still failing on my home machine, so I need to dig a little deeper.

          Show
          Camille Fournier added a comment - Actually an update that this is still failing on my home machine, so I need to dig a little deeper.
          Hide
          Camille Fournier added a comment -

          I have a patch for the rw failure that I can submit tonight. Don't have time right now to look at the other failure though.

          Show
          Camille Fournier added a comment - I have a patch for the rw failure that I can submit tonight. Don't have time right now to look at the other failure though.
          Hide
          Mahadev konar added a comment -

          Camille/Vishal,
          Any of you working on a patch for this? or volunteering to?

          Show
          Mahadev konar added a comment - Camille/Vishal, Any of you working on a patch for this? or volunteering to?
          Hide
          Camille Fournier added a comment -

          However, I see ReadOnlyModeTest.testConnectionEvents regularly failing in my env, not sure what's up with that.

          Show
          Camille Fournier added a comment - However, I see ReadOnlyModeTest.testConnectionEvents regularly failing in my env, not sure what's up with that.
          Hide
          Camille Fournier added a comment -

          testSeekForRwServer is failing because it waits only to be connected, not connected in RW mode. This means that if the client connects back to the RO server, The fix is to have the watcher wait for RW connected. I can submit a patch for this later.

          Show
          Camille Fournier added a comment - testSeekForRwServer is failing because it waits only to be connected, not connected in RW mode. This means that if the client connects back to the RO server, The fix is to have the watcher wait for RW connected. I can submit a patch for this later.

            People

            • Assignee:
              Vishal Kher
              Reporter:
              Vishal Kher
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development