ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1557

jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.4.5, 3.5.0
    • Fix Version/s: 3.4.6, 3.5.0
    • Component/s: server, tests
    • Labels:
      None
    • Release Note:
      Committed to 3.4.6/trunk. Thanks Eugene.

      Description

      Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:

      https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/

      haven't seen this before.

      1. jstack.out
        13 kB
        Eugene Koontz
      2. SaslAuthFailTest.log
        2 kB
        Eugene Koontz
      3. ZOOKEEPER-1557.patch
        6 kB
        Patrick Hunt
      4. ZOOKEEPER-1557.patch
        6 kB
        Eugene Koontz

        Issue Links

          Activity

          Hide
          Flavio Junqueira added a comment -

          Closing issues after releasing 3.4.6.

          Show
          Flavio Junqueira added a comment - Closing issues after releasing 3.4.6.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in ZooKeeper-trunk #2099 (See https://builds.apache.org/job/ZooKeeper-trunk/2099/)
          ZOOKEEPER-1557. jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch (Eugene Koontz via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535251)

          • /zookeeper/trunk/CHANGES.txt
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/SaslAuthFailNotifyTest.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/SaslAuthFailTest.java
          Show
          Hudson added a comment - SUCCESS: Integrated in ZooKeeper-trunk #2099 (See https://builds.apache.org/job/ZooKeeper-trunk/2099/ ) ZOOKEEPER-1557 . jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch (Eugene Koontz via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535251 ) /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/SaslAuthFailNotifyTest.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/SaslAuthFailTest.java
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12609963/ZOOKEEPER-1557.patch
          against trunk revision 1534844.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1720//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1720//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1720//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609963/ZOOKEEPER-1557.patch against trunk revision 1534844. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1720//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1720//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1720//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12609963/ZOOKEEPER-1557.patch
          against trunk revision 1534844.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1719//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1719//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1719//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609963/ZOOKEEPER-1557.patch against trunk revision 1534844. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1719//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1719//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1719//console This message is automatically generated.
          Hide
          Patrick Hunt added a comment -

          Same patch file, just update so the date will be recent. (and qabot pick it up)

          Show
          Patrick Hunt added a comment - Same patch file, just update so the date will be recent. (and qabot pick it up)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548122/SaslAuthFailTest.log
          against trunk revision 1534844.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1718//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548122/SaslAuthFailTest.log against trunk revision 1534844. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1718//console This message is automatically generated.
          Hide
          Patrick Hunt added a comment -

          The original patch seems reasonable approach based on what Thawan is saying. Kicked a qabot, if that passes I'll commit this (unless there are objections)

          Show
          Patrick Hunt added a comment - The original patch seems reasonable approach based on what Thawan is saying. Kicked a qabot, if that passes I'll commit this (unless there are objections)
          Hide
          Thawan Kooburat added a comment -

          If JDK7 test fail intermittently (but not in JDK6), it is probably due to interference between each unit test in the same file when they are running in different order. For AutoResetWithPending, it was due to JVM flag leakage from one test to another. A simple fix is to separate the test as Eugene suggest or actually track down what is the interference and clean up each test properly.

          Show
          Thawan Kooburat added a comment - If JDK7 test fail intermittently (but not in JDK6), it is probably due to interference between each unit test in the same file when they are running in different order. For AutoResetWithPending, it was due to JVM flag leakage from one test to another. A simple fix is to separate the test as Eugene suggest or actually track down what is the interference and clean up each test properly.
          Hide
          Patrick Hunt added a comment -

          I recently re-enabled the jdk7 job on Apache jenkins. Over the last week or two we've seen two failures.

          This is still occurring on trunk/branch34.

          https://builds.apache.org/job/ZooKeeper-trunk-jdk7/678/
          https://builds.apache.org/job/ZooKeeper_branch34_jdk7/377/

          Eugene Koontz any chance you can look into this further? jdk6 is eol so more an more folks will need jdk7. Thanks!

          Show
          Patrick Hunt added a comment - I recently re-enabled the jdk7 job on Apache jenkins. Over the last week or two we've seen two failures. This is still occurring on trunk/branch34. https://builds.apache.org/job/ZooKeeper-trunk-jdk7/678/ https://builds.apache.org/job/ZooKeeper_branch34_jdk7/377/ Eugene Koontz any chance you can look into this further? jdk6 is eol so more an more folks will need jdk7. Thanks!
          Hide
          Patrick Hunt added a comment -

          I checked the zookeeper-trunk-jdk7 build and it hasn't run since June, which is strange.

          the job was pinned to hadoop6, which now seems to be gone. I updated the config for this job to use the std set of hosts rather than hadoop6 exclusively.

          Show
          Patrick Hunt added a comment - I checked the zookeeper-trunk-jdk7 build and it hasn't run since June, which is strange. the job was pinned to hadoop6, which now seems to be gone. I updated the config for this job to use the std set of hosts rather than hadoop6 exclusively.
          Hide
          Flavio Junqueira added a comment -

          Moving it to 3.5.0.

          Show
          Flavio Junqueira added a comment - Moving it to 3.5.0.
          Hide
          Flavio Junqueira added a comment -

          I checked the zookeeper-trunk-jdk7 build and it hasn't run since June, which is strange. Before June, the builds that failed were due to this test:

          org.apache.zookeeper.server.TruncateCorruptionTest.testTransactionLogCorruption
          

          I don't think we have observed this problem since the last comments from Patrick Hunt, Mahadev konar, and Eugene Koontz late last year, so I wonder if this is still an issue. I don't want to hold 3.4.6 because of this issue, so if there is no reason or interest in pursuing it further, I'd like to push it to 3.5.0 or simply resolve it for now. This issue is not blocking 3.4.6 quite yet, but I'm trying to get to a resolution for all issues marked for 3.4.6.

          Show
          Flavio Junqueira added a comment - I checked the zookeeper-trunk-jdk7 build and it hasn't run since June, which is strange. Before June, the builds that failed were due to this test: org.apache.zookeeper.server.TruncateCorruptionTest.testTransactionLogCorruption I don't think we have observed this problem since the last comments from Patrick Hunt , Mahadev konar , and Eugene Koontz late last year, so I wonder if this is still an issue. I don't want to hold 3.4.6 because of this issue, so if there is no reason or interest in pursuing it further, I'd like to push it to 3.5.0 or simply resolve it for now. This issue is not blocking 3.4.6 quite yet, but I'm trying to get to a resolution for all issues marked for 3.4.6.
          Hide
          Flavio Junqueira added a comment -

          I was wondering if we should consider this issue for 3.4.6 or not. Apparently there were issues at the time we released 3.4.5, so I wonder if we should consider it for 3.4.6 as it is currently marked.

          Show
          Flavio Junqueira added a comment - I was wondering if we should consider this issue for 3.4.6 or not. Apparently there were issues at the time we released 3.4.5, so I wonder if we should consider it for 3.4.6 as it is currently marked.
          Show
          Patrick Hunt added a comment - Also here: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-jdk7/476/testReport/junit/org.apache.zookeeper.test/WatcherTest/testWatchAutoResetWithPending/
          Show
          Patrick Hunt added a comment - Watcher test continues to fail occasionally - latest here: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-openjdk7/7/testReport/junit/org.apache.zookeeper.test/WatcherTest/testWatcherAutoResetDisabledWithLocal/
          Hide
          Mahadev konar added a comment -

          Thanks Eugene .. Interesting

          Show
          Mahadev konar added a comment - Thanks Eugene .. Interesting
          Hide
          Eugene Koontz added a comment -

          Hi Mahadev,
          It sounds fine to me to release 3.4.5 as is. I'm sure we'll eventually figure out what's going on with this. Searching for "Junit and JDK7" I found this: http://wiki.apidesign.org/wiki/OrderOfElements ; might be relevant.

          -Eugene

          Show
          Eugene Koontz added a comment - Hi Mahadev, It sounds fine to me to release 3.4.5 as is. I'm sure we'll eventually figure out what's going on with this. Searching for "Junit and JDK7" I found this: http://wiki.apidesign.org/wiki/OrderOfElements ; might be relevant. -Eugene
          Hide
          Mahadev konar added a comment - - edited

          Thanks Eugene for taking a look at it. Given your analysis above it doesnt look like we have a full knowledge of whats causing the issue. Given that this is not SASL related and could be related to how our test framework runs, I think we can move this out to 3.4.6 and get 3.4.5 out the door with what we have now. What do you think?

          Show
          Mahadev konar added a comment - - edited Thanks Eugene for taking a look at it. Given your analysis above it doesnt look like we have a full knowledge of whats causing the issue. Given that this is not SASL related and could be related to how our test framework runs, I think we can move this out to 3.4.6 and get 3.4.5 out the door with what we have now. What do you think?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548122/SaslAuthFailTest.log
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1207//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548122/SaslAuthFailTest.log against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1207//console This message is automatically generated.
          Hide
          Eugene Koontz added a comment -

          tail end of test's log. Note the line:

          [junit] 2012-10-06 13:26:06,001 [myid:] - INFO  [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
          

          This line is only reachable if runnable=false, or if there was an InterruptedException caught, which would log an Unexpected interruption error message, which does not appear.

          Show
          Eugene Koontz added a comment - tail end of test's log. Note the line: [junit] 2012-10-06 13:26:06,001 [myid:] - INFO [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop! This line is only reachable if runnable=false , or if there was an InterruptedException caught, which would log an Unexpected interruption error message, which does not appear.
          Hide
          Eugene Koontz added a comment -

          jstack of test's JVM - note the thread waiting in SessionTrackerImpl.java:146, which is inside while(running), indicating that for this thread, running==true.

          Show
          Eugene Koontz added a comment - jstack of test's JVM - note the thread waiting in SessionTrackerImpl.java:146, which is inside while(running) , indicating that for this thread, running==true .
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548109/ZOOKEEPER-1557.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1206//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1206//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1206//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548109/ZOOKEEPER-1557.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1206//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1206//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1206//console This message is automatically generated.
          Hide
          Eugene Koontz added a comment -

          I can reproduce the problem - running SaslAuthFailTest with the following JDK and JRE:

          $ javac -version
          javac 1.7.0_07
          $ java -version
          java version "1.7.0_07"
          Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
          Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
          

          with ant -Dtest.output=$test_output -Dtestcase=SaslAuthFailTest junit.run.

          This causes an eventual failure the same as Jenkins build 407 as in the Description. It usually takes several iterations to see the failure (less than 20 usually).

          Separating the two tests in SaslAuthFailTest into two files seems to fix the problem. I'm not sure why, though. I spent some time looking at SessionTrackerImpl.java. The server's SessionTrackerImpl thread seems to be unable to exit from the while(running) loop, even though the volatile boolean running becomes false. If I run jstack on the test JVM's process, it looks like there are actually two SessionTrackerImpl threads running, which seems wrong to me. This led me to the workaround of separating the two tests into different files, which is shown in the attached patch.

          Show
          Eugene Koontz added a comment - I can reproduce the problem - running SaslAuthFailTest with the following JDK and JRE: $ javac -version javac 1.7.0_07 $ java -version java version "1.7.0_07" Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) with ant -Dtest.output=$test_output -Dtestcase=SaslAuthFailTest junit.run . This causes an eventual failure the same as Jenkins build 407 as in the Description. It usually takes several iterations to see the failure (less than 20 usually). Separating the two tests in SaslAuthFailTest into two files seems to fix the problem. I'm not sure why, though. I spent some time looking at SessionTrackerImpl.java. The server's SessionTrackerImpl thread seems to be unable to exit from the while(running) loop, even though the volatile boolean running becomes false. If I run jstack on the test JVM's process, it looks like there are actually two SessionTrackerImpl threads running, which seems wrong to me. This led me to the workaround of separating the two tests into different files, which is shown in the attached patch.
          Hide
          Eugene Koontz added a comment -

          Taking a look. Although they are different test failures (SaslAuthFailTest vs WatcherTest), perhaps they're both related in some way.

          Show
          Eugene Koontz added a comment - Taking a look. Although they are different test failures (SaslAuthFailTest vs WatcherTest), perhaps they're both related in some way.
          Hide
          Patrick Hunt added a comment -

          Failed a second time:

          https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-jdk7/408/

          java.util.concurrent.TimeoutException: Did not connect
                  at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:129)
                  at org.apache.zookeeper.test.WatcherTest.testWatchAutoResetWithPending(WatcherTest.java:199)
          
          Show
          Patrick Hunt added a comment - Failed a second time: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-jdk7/408/ java.util.concurrent.TimeoutException: Did not connect at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:129) at org.apache.zookeeper.test.WatcherTest.testWatchAutoResetWithPending(WatcherTest.java:199)
          Hide
          Patrick Hunt added a comment -

          Eugene can you take a look at this? Might be related to the recent sasl changes...

          Show
          Patrick Hunt added a comment - Eugene can you take a look at this? Might be related to the recent sasl changes...

            People

            • Assignee:
              Eugene Koontz
              Reporter:
              Patrick Hunt
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development