Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
RS calls CompactSplit#join to cease all compactSplit threads.
CompactSplit.java
private void waitFor(ThreadPoolExecutor t, String name) { boolean done = false; while (!done) { try { done = t.awaitTermination(60, TimeUnit.SECONDS); LOG.info("Waiting for " + name + " to finish..."); if (!done) { t.shutdownNow(); } } catch (InterruptedException ie) { LOG.warn("Interrupted waiting for " + name + " to finish..."); } } }
In the meantime, the async wal may wait for the sync signal. However, the single won't happen as the wal sync is failed.
synchronized long get(long timeoutNs) throws InterruptedException, ExecutionException, TimeoutIOException { final long done = System.nanoTime() + timeoutNs; while (!isDone()) { wait(1000); if (System.nanoTime() >= done) { throw new TimeoutIOException( "Failed to get sync result after " + TimeUnit.NANOSECONDS.toMillis(timeoutNs) + " ms for txid=" + this.txid + ", WAL system stuck?"); } } if (this.throwable != null) { throw new ExecutionException(this.throwable); } return this.doneTxid; }
When we shutdown the mini cluster, JVMClusterUtil#shutdown sends the interrupt single to all rs threads. And then catching the InterruptedException cause compactionsplit to skip the #shutdownNow, hence the compactionsplit threads were up until timeout (default is 5 min).
for (int i = 0; i < 100; ++i) { boolean atLeastOneLiveServer = false; for (RegionServerThread t : regionservers) { if (t.isAlive()) { atLeastOneLiveServer = true; try { LOG.warn("RegionServerThreads remaining, give one more chance before interrupting"); t.join(1000); } catch (InterruptedException e) { wasInterrupted = true; } } } if (!atLeastOneLiveServer) break; for (RegionServerThread t : regionservers) { if (t.isAlive()) { LOG.warn("RegionServerThreads taking too long to stop, interrupting"); t.interrupt(); } } }