Description
Could be related to SPARK-10680
This is the test and one fix would be to increase the timeouts from 1.2 seconds to 5 seconds
// The timeout is relative to the LAST request sent, which is kinda weird, but still. // This test also makes sure the timeout works for Fetch requests as well as RPCs. @Test public void furtherRequestsDelay() throws Exception { final byte[] response = new byte[16]; final StreamManager manager = new StreamManager() { @Override public ManagedBuffer getChunk(long streamId, int chunkIndex) { Uninterruptibles.sleepUninterruptibly(FOREVER, TimeUnit.MILLISECONDS); return new NioManagedBuffer(ByteBuffer.wrap(response)); } }; RpcHandler handler = new RpcHandler() { @Override public void receive( TransportClient client, ByteBuffer message, RpcResponseCallback callback) { throw new UnsupportedOperationException(); } @Override public StreamManager getStreamManager() { return manager; } }; TransportContext context = new TransportContext(conf, handler); server = context.createServer(); clientFactory = context.createClientFactory(); TransportClient client = clientFactory.createClient(TestUtils.getLocalHost(), server.getPort()); // Send one request, which will eventually fail. TestCallback callback0 = new TestCallback(); client.fetchChunk(0, 0, callback0); Uninterruptibles.sleepUninterruptibly(1200, TimeUnit.MILLISECONDS); // Send a second request before the first has failed. TestCallback callback1 = new TestCallback(); client.fetchChunk(0, 1, callback1); Uninterruptibles.sleepUninterruptibly(1200, TimeUnit.MILLISECONDS); // not complete yet, but should complete soon assertEquals(-1, callback0.successLength); assertNull(callback0.failure); callback0.latch.await(60, TimeUnit.SECONDS); assertTrue(callback0.failure instanceof IOException); // failed at same time as previous assertTrue(callback1.failure instanceof IOException); // This is where we fail because callback1.failure is null }
If there are better suggestions for improving this test let's take them onboard, I think using 5 sec timeout periods would be a place to start so folks don't need to needlessly triage this failure. Will add a few prints and report back