I have the machine back in the state where this reproduces and am sorry to say that there is still a hang in a different method, even with my prior attempt to get past it, but since I can reproduce now, I should be able to make some progress on this issue. I'll record some info here in case it becomes hard to reproduce again.
The current state of hang is that the launched network server process which seems to specify all the drda parameters without values:
cloudtst 6488248 4390978 0 14:41:38 - 0:20 /local1/IBM_JDK/15sr13/sdk/jr
e/bin/java -classpath /local1/kmarsden/repro/derby-4319/jars//derby.jar:/local1/
drda.logConnections= -Dderby.drda.traceAll= -Dderby.drda.traceDirectory= -Dderby
.drda.keepAlive= -Dderby.drda.timeSlice= -Dderby.drda.host= -Dderby.drda.portNum
ber= -Dderby.drda.minThreads= -Dderby.drda.maxThreads= -Dderby.drda.startNetwork
Server= -Dderby.drda.debug= org.apache.derby.drda.NetworkServerControl start -h
localhost -p 1527
I will attach the javacore with thread dump as LaunchedNetworkServer.javacore.20110309.160148.6488248.0001.txt
The server threads look pretty normal with a ClientThread running waiting to accept requests.
The test process is hung in NetworkServerTestSetup.complete(). I am not sure if it is later or if the change I made just did not work. I will attach the test process file as:
If I try to ping the server from the command line I get a ConnectionReset error:
$ java org.apache.derby.drda.NetworkServerControl ping
Thu Mar 10 12:47:39 PST 2011 : Error on client socket:
Thu Mar 10 12:47:39 PST 2011 : Connection reset
java.net.SocketException: Connection reset
Then after that subsequent ping attempts hang and a new thread dump on the Network Server process shows that the ClientThread is no longer there. I think this should never happen. I think a lot of work has been put into making sure that the ClientThread always survives any type of error in order host more connections. see attachment LaunchedNetworkServerAfterPing.javacore.20110310.124948.6488248.0002.txt
Another thing to note is that prior to the defaultProperties test there was actually a stack trace in the setPortPriorty test with a Connection reset which did not cause failure. see TestOutput2011-03-09.txt .out
This issue actually has many facets that are worth working on:
1) How do we make sure a spawned network server process is destroyed if it hangs the whole suite?
2) Under what circumstances can the Network Server ClientThread that loops accepting new connections be destroyed?
3) What sort of problem is being caused on AIX by starting network server with these odd options? I am thinking maybe it is related to soTimeout or keepalive getting set to an unexpected option but am not sure.
I have been holding off on working on 3, because it provides a good reproduction for issue one and two but think that at this point, the best thing to do would be to disable the problematic fixture on AIX whether it is testSetpPortPriority or testDefaultProperties. Then I can work on all three issues in a logical order and pace without release concerns. I'll look into doing that.