Description
Each new run of snapshot verification creates dozens of new threads that do not terminate after the procedure is complete. Over time, this can lead to an OutOfMemoryError and node failure.
@Test public void testClusterSnapshotCheckMultipleTimes() throws Exception { IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE); startClientGrid(); ignite.snapshot().createSnapshot(SNAPSHOT_NAME) .get(); int activeThreadsCntBefore = Thread.activeCount(); int iterations = 10; for (int i = 0; i < iterations; i++) snp(ignite).checkSnapshot(SNAPSHOT_NAME).get(); int createdThreads = Thread.activeCount() - activeThreadsCntBefore; assertTrue("Threads created: " + createdThreads, createdThreads < iterations); }
Reproducer shows that 10 snapshot checks add approx ~250 new threads.
The dump of "leaked" thread looks like this:
"binary-metadata-writer-#2208" #2249 prio=5 os_prio=0 tid=0x00007f9974087000 nid=0x65b38 waiting on condition [0x00007f986cf9c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <merged>(a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.ignite.internal.processors.cache.binary.BinaryMetadataFileStore$BinaryMetadataAsyncWriter.body0(BinaryMetadataFileStore.java:460) at org.apache.ignite.internal.processors.cache.binary.BinaryMetadataFileStore$BinaryMetadataAsyncWriter.body(BinaryMetadataFileStore.java:441) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748)
Attachments
Issue Links
- causes
-
IGNITE-15205 StandaloneWalRecordsIterator closes the kernal context used by Ignite node
- Resolved
- is required by
-
IGNITE-14794 Add JMX command and metrics for automatic snapshot restore operation.
- Resolved
- links to