I was wondering if the root cause for these failures could be similar the same problem that caused
DERBY-4201, so I tried to run the replication tests with the repro patch attached to that issue. And indeed many of the replication tests failed with connection refused when they ran with the patched code.
So it seems like one possible cause of the problem reported here, is that a server (slave or master) is not fully shut down after ReplicationRun.tearDown() has completed. tearDown() invokes a shutdown command on the slave server and on the master server. However, as seen in
DERBY-4201, a server shutdown command returns when the server stops responding, which happens before the server is fully closed down. So if a new network server is started shortly thereafter, it may not be able to start successfully because the port hasn't been released yet. Since the network server doesn't start, clients that try to connect will see "connection refused" errors.
The attached patch attempts to address this issue by letting ReplicationRun.tearDown() wait until all server processes have completed. It does that by keeping a list of java.lang.Thread instances that read the output from the server processes, and calls join() on all those threads in tearDown(). This way, we won't start the next test case (and the next server) until the servers started by the previous test case have been terminated.
With this patch, the replication test suite ran cleanly, even with the repro patch from