Derby
  1. Derby
  2. DERBY-4326

hang on winvista with IBM 1.6 SR5 in NSSecurityMechanismTest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.5.2.0
    • Fix Version/s: 10.5.3.1, 10.6.1.0
    • Component/s: Network Server
    • Labels:
      None
    • Urgency:
      Urgent
    • Bug behavior facts:
      Crash, Regression Test Failure

      Description

      This is stopping my platform testing on winvista with ibm 1.6 SR5.

      During the initial run of suites.All, I got:
      --------------------------------
      testNetworkServerSecurityMechanism java.net.SocketException: Connection reset by peer: setsockopt
      at java.net.PlainSocketImpl.socketSetOption(Native Method)
      at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:283)
      at java.net.Socket.setKeepAlive(Socket.java:1171)
      at org.apache.derby.impl.drda.ClientThread.run(Unknown Source)
      --------------------------------
      Rerunning suites.All gave the same error.
      There is only 1 java process active & hanging.

      Running just the test itself also results in a hang but there was (so far - it's been sitting unchanged for 4 hours) java.net.SocketException.

      1. DERBY4326_printExceptions_patch2_diff.txt
        3 kB
        Mamta A. Satoor
      2. mamtapatch1result.jar
        3 kB
        Myrna van Lunteren
      3. run.out
        12 kB
        Myrna van Lunteren
      4. DERBY4326_notForCommit_patch1_diff.txt
        3 kB
        Mamta A. Satoor
      5. javacore.20090728.114939.5640.0001.txt
        228 kB
        Myrna van Lunteren
      6. javacore.20090728.095131.5196.0001.txt
        639 kB
        Myrna van Lunteren

        Activity

        Hide
        Mamta A. Satoor added a comment -

        Thanks for running the tests on winvista, Myrna. both derbyall and junit suite ran fine on windows xp machine too with IBM 16. Have committed the merged changes into 10.5 codeline using revision 824850. Will go ahead and resolve the issue

        Show
        Mamta A. Satoor added a comment - Thanks for running the tests on winvista, Myrna. both derbyall and junit suite ran fine on windows xp machine too with IBM 16. Have committed the merged changes into 10.5 codeline using revision 824850. Will go ahead and resolve the issue
        Hide
        Myrna van Lunteren added a comment -

        I merged the fix to 10.5, built, then ran suites.All and derbyall on my winvista machine with ibm16 and there were no failures nor errors.

        Show
        Myrna van Lunteren added a comment - I merged the fix to 10.5, built, then ran suites.All and derbyall on my winvista machine with ibm16 and there were no failures nor errors.
        Hide
        Mamta A. Satoor added a comment -

        Committed changes into 823659 in trunk. Will merge into 10.5 codeline next

        Show
        Mamta A. Satoor added a comment - Committed changes into 823659 in trunk. Will merge into 10.5 codeline next
        Hide
        Myrna van Lunteren added a comment -

        just to confirm that the T_RawStoreFactory failure is unrelated, I rebuilt without any patches and I get the same failure. The failure is indicating there is still an observer when it expects 0 observers, and that failure we have seen before (e.g. DERBY-3993)

        Show
        Myrna van Lunteren added a comment - just to confirm that the T_RawStoreFactory failure is unrelated, I rebuilt without any patches and I get the same failure. The failure is indicating there is still an observer when it expects 0 observers, and that failure we have seen before (e.g. DERBY-3993 )
        Hide
        Myrna van Lunteren added a comment -

        I tested patch2 on my winvista machine and ran into no failures except for unit/T_RawStoreFactory.unit.

        I don't think that failure is related, although it appears consistent...

        I have 1 nit: it seems the patch has 1 unnecessary white-character change.
        +1 for commit from me.

        Show
        Myrna van Lunteren added a comment - I tested patch2 on my winvista machine and ran into no failures except for unit/T_RawStoreFactory.unit. I don't think that failure is related, although it appears consistent... I have 1 nit: it seems the patch has 1 unnecessary white-character change. +1 for commit from me.
        Hide
        Mamta A. Satoor added a comment -

        I am attaching a new patch DERBY4326_printExceptions_patch2_diff.txt which ran derbyall fine (with no derbyNetAutoStart failure). During an IOException, I was printing the stack trace before checking if the exception was for server shutdown. This patch first checks for server shutdown and if we are not dealing with that, then we should log the exception trace. If there are no comments, I will work on committing this patch soon.

        Show
        Mamta A. Satoor added a comment - I am attaching a new patch DERBY4326_printExceptions_patch2_diff.txt which ran derbyall fine (with no derbyNetAutoStart failure). During an IOException, I was printing the stack trace before checking if the exception was for server shutdown. This patch first checks for server shutdown and if we are not dealing with that, then we should log the exception trace. If there are no comments, I will work on committing this patch soon.
        Hide
        Myrna van Lunteren added a comment -

        Forgot to mention, that I did get a full-no-failures-run of suites.All on my vista machine - but I can no longer remember if this was with ibm 16 and -Xmx1024M -Xms1024M or sun jdk 1.6 with -XX:MaxPermSize=256M. I'm guessing it was the ibm 1.6.

        Show
        Myrna van Lunteren added a comment - Forgot to mention, that I did get a full-no-failures-run of suites.All on my vista machine - but I can no longer remember if this was with ibm 16 and -Xmx1024M -Xms1024M or sun jdk 1.6 with -XX:MaxPermSize=256M. I'm guessing it was the ibm 1.6.
        Hide
        Myrna van Lunteren added a comment -

        With a build with mamta's patch applied I ran derbyall and suites.All on my winvista box and on my XP.
        On winvista, this time I ran with sun's jdk 1.6 but forgot to set -XX:MaxPermSize=128M for suites.All and ran out of memory during the upgrade tests...
        However, I did see a couple of java.net.socketExceptions logged, in the following tests:

        • derbyall: derbyNetAutoStart. (both XP & my winvista).
          The diff is like this:
          --------------------
          1a2
          > java.net.SocketException: socket closed
          2a4
          > java.net.SocketException: socket closed
          -------------------
          As we're now logging SocketExceptions, I think this can be expected. If indeed it is expected, we can filter this out again using a _sed.properties file.
        • suites.All:
          on both XP & netvista:
        • DerbyNetNewServer. (test passed anyway),
        • ttestSetPortPriority (4x. test passed anyway).
          The difference is in fixture ttestNetworkServerSecurityMechanism; both have 3 identical stack traces, but the first one is different on my winvista box vs. XP:
          winvista:
          ----------------------
          testNetworkServerSecurityMechanism java.net.SocketException: Connection reset by peer: setsockopt
          at java.net.PlainSocketImpl.socketSetOption(Native Method)
          at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:264)
          at java.net.Socket.setKeepAlive(Socket.java:1160)
          at org.apache.derby.impl.drda.ClientThread.run(ClientThread.java:79)
          ----------------------
          xp:
          ----------------------
          testNetworkServerSecurityMechanism java.net.SocketException: socket closed
          at java.net.PlainSocketImpl.socketAccept(Native Method)
          at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:414)
          at java.net.ServerSocket.implAccept(ServerSocket.java:464)
          at java.net.ServerSocket.accept(ServerSocket.java:432)
          at org.apache.derby.impl.drda.ClientThread$1.run(ClientThread.java:67)
          at java.security.AccessController.doPrivileged(AccessController.java:251)
          at org.apache.derby.impl.drda.ClientThread.run(ClientThread.java:62)
          ----------------------
          Even so, it now looks as though this test now passes on both OSs.
        Show
        Myrna van Lunteren added a comment - With a build with mamta's patch applied I ran derbyall and suites.All on my winvista box and on my XP. On winvista, this time I ran with sun's jdk 1.6 but forgot to set -XX:MaxPermSize=128M for suites.All and ran out of memory during the upgrade tests... However, I did see a couple of java.net.socketExceptions logged, in the following tests: derbyall: derbyNetAutoStart. (both XP & my winvista). The diff is like this: -------------------- 1a2 > java.net.SocketException: socket closed 2a4 > java.net.SocketException: socket closed ------------------- As we're now logging SocketExceptions, I think this can be expected. If indeed it is expected, we can filter this out again using a _sed.properties file. suites.All: on both XP & netvista: DerbyNetNewServer. (test passed anyway), ttestSetPortPriority (4x. test passed anyway). The difference is in fixture ttestNetworkServerSecurityMechanism; both have 3 identical stack traces, but the first one is different on my winvista box vs. XP: winvista: ---------------------- testNetworkServerSecurityMechanism java.net.SocketException: Connection reset by peer: setsockopt at java.net.PlainSocketImpl.socketSetOption(Native Method) at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:264) at java.net.Socket.setKeepAlive(Socket.java:1160) at org.apache.derby.impl.drda.ClientThread.run(ClientThread.java:79) ---------------------- xp: ---------------------- testNetworkServerSecurityMechanism java.net.SocketException: socket closed at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:414) at java.net.ServerSocket.implAccept(ServerSocket.java:464) at java.net.ServerSocket.accept(ServerSocket.java:432) at org.apache.derby.impl.drda.ClientThread$1.run(ClientThread.java:67) at java.security.AccessController.doPrivileged(AccessController.java:251) at org.apache.derby.impl.drda.ClientThread.run(ClientThread.java:62) ---------------------- Even so, it now looks as though this test now passes on both OSs.
        Hide
        Kathey Marsden added a comment -

        I did not apply but looked briefly at Mamta's patch. I think in the shutdown case we should not log the interrupt exception as it is an expected part of the shutdown.

        In the code:
        + if (parent.getShutdown())

        { + clientSocket.close(); + return; }

        We should probably also check that clientSocket != null.
        Otherwise it looks ok except for some comment cleanup.

        Show
        Kathey Marsden added a comment - I did not apply but looked briefly at Mamta's patch. I think in the shutdown case we should not log the interrupt exception as it is an expected part of the shutdown. In the code: + if (parent.getShutdown()) { + clientSocket.close(); + return; } We should probably also check that clientSocket != null. Otherwise it looks ok except for some comment cleanup.
        Hide
        Myrna van Lunteren added a comment -

        Attaching a jar-ed up of the results directory (minus system\wombat) from running with mamta's not-for-commit patch (ran with modified sane trunk jars, and ibm1.6). The test fails, but at least the hang is gone.
        Apparently, judging from derby.log I'm getting 'invalid authentication' - or is that expected with this test?

        Show
        Myrna van Lunteren added a comment - Attaching a jar-ed up of the results directory (minus system\wombat) from running with mamta's not-for-commit patch (ran with modified sane trunk jars, and ibm1.6). The test fails, but at least the hang is gone. Apparently, judging from derby.log I'm getting 'invalid authentication' - or is that expected with this test?
        Hide
        Myrna van Lunteren added a comment -

        attaching the file to which I had sent the console output when running the test with ibm 1.6 and sane trunk jars with the test change suggested by Kathey.
        Nothing happened at all, until I used the SendSignal.exe tool at which time the java core dump also ended up in the redirected output.
        The only other info was in logs\consoleServerOutput.log:
        -------------------
        2009-08-04 23:23:41.969 GMT : Apache Derby Network Server - 10.6.0.0 alpha - (80
        0946M) started and ready to accept connections on port 1527
        2009-08-04 23:23:47.859 GMT : Connection reset by peer: setsockopt
        java.net.SocketException: Connection reset by peer: setsockopt
        at java.net.PlainSocketImpl.socketSetOption(Native Method)
        at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:264)
        at java.net.Socket.setKeepAlive(Socket.java:1160)
        at org.apache.derby.impl.drda.ClientThread.run(ClientThread.java:79)
        ---------------------

        Show
        Myrna van Lunteren added a comment - attaching the file to which I had sent the console output when running the test with ibm 1.6 and sane trunk jars with the test change suggested by Kathey. Nothing happened at all, until I used the SendSignal.exe tool at which time the java core dump also ended up in the redirected output. The only other info was in logs\consoleServerOutput.log: ------------------- 2009-08-04 23:23:41.969 GMT : Apache Derby Network Server - 10.6.0.0 alpha - (80 0946M) started and ready to accept connections on port 1527 2009-08-04 23:23:47.859 GMT : Connection reset by peer: setsockopt java.net.SocketException: Connection reset by peer: setsockopt at java.net.PlainSocketImpl.socketSetOption(Native Method) at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:264) at java.net.Socket.setKeepAlive(Socket.java:1160) at org.apache.derby.impl.drda.ClientThread.run(ClientThread.java:79) ---------------------
        Hide
        Myrna van Lunteren added a comment -

        As an aside...I thought I'd try to run suites.All with junit.textui.TestRunner with only embedded. First I tried without derbynet.jar and derbyclient.jar in my classpath, but they still got picked up by sysinfo, so I then removed them physically from the dir where the other derby jars lived. This resulted in this error:
        Failed to invoke suite():java.lang.NoClassDefFoundError: org.apache.derby.jdbc.ClientConnectionPoolDataSource
        So apparently, there's no way to do this...

        Show
        Myrna van Lunteren added a comment - As an aside...I thought I'd try to run suites.All with junit.textui.TestRunner with only embedded. First I tried without derbynet.jar and derbyclient.jar in my classpath, but they still got picked up by sysinfo, so I then removed them physically from the dir where the other derby jars lived. This resulted in this error: Failed to invoke suite():java.lang.NoClassDefFoundError: org.apache.derby.jdbc.ClientConnectionPoolDataSource So apparently, there's no way to do this...
        Hide
        Myrna van Lunteren added a comment -

        The last successful run on this machine was with 10.3 and ibm's 1.4.2. The test has undergone some changes since that time...

        grabbed this issue while I will do the experiments suggested by Kathey and try out Mamta's patch.

        I did finally try this test on this machine with Sun's 1.6 JDK and the test had been going for up to 1 hour - (on my windows XP the networkserver fixture testNetworkServerSecurityMechanism takes about 2703 ms). So, looks like the same problem and it doesn't look like a jvm issue.

        Show
        Myrna van Lunteren added a comment - The last successful run on this machine was with 10.3 and ibm's 1.4.2. The test has undergone some changes since that time... grabbed this issue while I will do the experiments suggested by Kathey and try out Mamta's patch. I did finally try this test on this machine with Sun's 1.6 JDK and the test had been going for up to 1 hour - (on my windows XP the networkserver fixture testNetworkServerSecurityMechanism takes about 2703 ms). So, looks like the same problem and it doesn't look like a jvm issue.
        Hide
        Tiago R. Espinha added a comment -

        I might be experiencing this same issue under Ubuntu with Sun's 1.6 JDK. My NSSecurityMechanismTest is taking around 22 minutes to run which seems to be a lot.

        Show
        Tiago R. Espinha added a comment - I might be experiencing this same issue under Ubuntu with Sun's 1.6 JDK. My NSSecurityMechanismTest is taking around 22 minutes to run which seems to be a lot.
        Hide
        Mamta A. Satoor added a comment -

        Attaching a patch DERBY4326_notForCommit_patch1_diff.txt. This patch is not ready for commit.

        I changed the code so that we don't simply return from the ClientThread when we run into InterruptedException or IOException. For InterruptedException, after putting something into the log about the exception, I check if the server has been shutdown and if so, then we should leave the ClientThread otherwise, we simply close the client socket. For IOException, I made changes to log it and then close the client socket and resume the loop of waiting for the next client request. Running derbyall with these changes made derbynet/DerbyNetAutoStart.java to fail. It appears that the test get SocketException which is a kind of IOException which I guess you can get if the server is down. I am not 100% about the cause but just printing and closing client socket for a SocketException is making the test to fail. I am going to have to stop working on this jira for a while until I look at DERBY-4331.

        Show
        Mamta A. Satoor added a comment - Attaching a patch DERBY4326_notForCommit_patch1_diff.txt. This patch is not ready for commit. I changed the code so that we don't simply return from the ClientThread when we run into InterruptedException or IOException. For InterruptedException, after putting something into the log about the exception, I check if the server has been shutdown and if so, then we should leave the ClientThread otherwise, we simply close the client socket. For IOException, I made changes to log it and then close the client socket and resume the loop of waiting for the next client request. Running derbyall with these changes made derbynet/DerbyNetAutoStart.java to fail. It appears that the test get SocketException which is a kind of IOException which I guess you can get if the server is down. I am not 100% about the cause but just printing and closing client socket for a SocketException is making the test to fail. I am going to have to stop working on this jira for a while until I look at DERBY-4331 .
        Hide
        Myrna van Lunteren added a comment -

        re 1) I didn't run deryball/suites.All on this machine for 10.5.1.1 testing.
        I'll try to figure out what combination the last successful testing cycle on this machine used.

        Show
        Myrna van Lunteren added a comment - re 1) I didn't run deryball/suites.All on this machine for 10.5.1.1 testing. I'll try to figure out what combination the last successful testing cycle on this machine used.
        Hide
        Kathey Marsden added a comment -

        While there is clearly a Network Server bug in its handling of the exception, I wonder if the setKeepAlive() error is a JVM regression. Maybe that socket is getting cleaned up too early on the client with the new service release. Myrna could you try two things:
        1) Try with the jdk version you used with 10.5.1.1 testing?
        2) Try with the jvm where you are getting the hang with the following test change. Instead of:
        DriverManager.getConnection(dbUrl).close();
        do
        Conection conn = getConnection(dbUrl).
        conn.close();

        Show
        Kathey Marsden added a comment - While there is clearly a Network Server bug in its handling of the exception, I wonder if the setKeepAlive() error is a JVM regression. Maybe that socket is getting cleaned up too early on the client with the new service release. Myrna could you try two things: 1) Try with the jdk version you used with 10.5.1.1 testing? 2) Try with the jvm where you are getting the hang with the following test change. Instead of: DriverManager.getConnection(dbUrl).close(); do Conection conn = getConnection(dbUrl). conn.close();
        Hide
        Myrna van Lunteren added a comment -

        I also got the same hang with 10.5.1.1.
        As others have tested on winvista successfully, I'm not complaining about this much, but it does prevent me from running the test suite.
        For now, I'll see if there's something peculiar with the machine.

        Show
        Myrna van Lunteren added a comment - I also got the same hang with 10.5.1.1. As others have tested on winvista successfully, I'm not complaining about this much, but it does prevent me from running the test suite. For now, I'll see if there's something peculiar with the machine.
        Hide
        Kathey Marsden added a comment -

        Marking crash since the server is unusable and all client requests will hang after it occurs.

        Show
        Kathey Marsden added a comment - Marking crash since the server is unusable and all client requests will hang after it occurs.
        Hide
        Kathey Marsden added a comment -

        The thread dump in the javacore reveals that this exception causes our ClientThread to go away, but the server stays up blocked in blockingStart() but has no ClientThread to handle new requests. I think typically the error Connection reset by peer would happen if the client closed the socket. Even though the client does do a quick close:
        DriverManager.getConnection(dbUrl).close();
        I don't think the getConnection() should return until we are well through this setKeepAlive code. Regardless of why it happened I don't think such an error should cause loss of the ClientThread. Currently we have:

        clientSocket.setKeepAlive(parent.getKeepAlive());

        // Set time out: Stops DDMReader.fill() from
        // waiting indefinitely when timeSlice is set.
        if (timeSlice > 0)
        clientSocket.setSoTimeout(timeSlice);

        //create a new Session for this socket
        parent.addSession(clientSocket);

        } catch (PrivilegedActionException e)

        { // Just throw the underlying exception throw e.getException(); }

        // end inner try/catch block

        Show
        Kathey Marsden added a comment - The thread dump in the javacore reveals that this exception causes our ClientThread to go away, but the server stays up blocked in blockingStart() but has no ClientThread to handle new requests. I think typically the error Connection reset by peer would happen if the client closed the socket. Even though the client does do a quick close: DriverManager.getConnection(dbUrl).close(); I don't think the getConnection() should return until we are well through this setKeepAlive code. Regardless of why it happened I don't think such an error should cause loss of the ClientThread. Currently we have: clientSocket.setKeepAlive(parent.getKeepAlive()); // Set time out: Stops DDMReader.fill() from // waiting indefinitely when timeSlice is set. if (timeSlice > 0) clientSocket.setSoTimeout(timeSlice); //create a new Session for this socket parent.addSession(clientSocket); } catch (PrivilegedActionException e) { // Just throw the underlying exception throw e.getException(); } // end inner try/catch block
        Hide
        Myrna van Lunteren added a comment -

        In the run of the test by itself, the socket reset appears to end up in logs/serverConsoleOutput.log:

        2009-07-28 17:04:06.967 GMT : Apache Derby Network Server - 10.5.2.0 - (794445) started and ready to accept connections on port 1527
        2009-07-28 17:04:37.169 GMT : Connection reset by peer: setsockopt
        java.net.SocketException: Connection reset by peer: setsockopt
        at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:283)
        at java.net.Socket.setKeepAlive(Socket.java:1171)
        at org.apache.derby.impl.drda.ClientThread.run(Unknown Source)

        Show
        Myrna van Lunteren added a comment - In the run of the test by itself, the socket reset appears to end up in logs/serverConsoleOutput.log: 2009-07-28 17:04:06.967 GMT : Apache Derby Network Server - 10.5.2.0 - (794445) started and ready to accept connections on port 1527 2009-07-28 17:04:37.169 GMT : Connection reset by peer: setsockopt java.net.SocketException: Connection reset by peer: setsockopt at java.net.PlainSocketImpl.setOption(PlainSocketImpl.java:283) at java.net.Socket.setKeepAlive(Socket.java:1171) at org.apache.derby.impl.drda.ClientThread.run(Unknown Source)
        Hide
        Myrna van Lunteren added a comment -

        Attaching 2 javacores; the first one - of process id 5196 - was from the rerun of suites.All that hung; the second one - of process id 5640 - was from the run of just NSSecurityMechanismTest.
        I did create another core dump about 10 mins after the run of the test by itself, but the only difference was the name of the java core dump file itself.

        Show
        Myrna van Lunteren added a comment - Attaching 2 javacores; the first one - of process id 5196 - was from the rerun of suites.All that hung; the second one - of process id 5640 - was from the run of just NSSecurityMechanismTest. I did create another core dump about 10 mins after the run of the test by itself, but the only difference was the name of the java core dump file itself.

          People

          • Assignee:
            Mamta A. Satoor
            Reporter:
            Myrna van Lunteren
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development