Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I saw the following thread dump on TC (only relevant parts are kept):
"exchange-worker-#10918%cache.IgniteClusterActivateDeactivateTest0%" #13121 prio=5 os_prio=0 tid=0x00007f0720137800 nid=0xbcf runnable [0x00007f0b46f66000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) - locked <0x00000000df6b3f88> (a java.lang.Object) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3676) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3323) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2991) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2872) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2715) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2674) at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1655) at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1706) at org.apache.ignite.internal.processors.cluster.ClusterProcessor.sendDiagnosticMessage(ClusterProcessor.java:614) at org.apache.ignite.internal.processors.cluster.ClusterProcessor.requestDiagnosticInfo(ClusterProcessor.java:556) at org.apache.ignite.internal.IgniteDiagnosticPrepareContext.send(IgniteDiagnosticPrepareContext.java:131) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpDebugInfo(GridCachePartitionExchangeManager.java:1914) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2914) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2721) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) ... "start-node-3" #13223 prio=5 os_prio=0 tid=0x00007f08a8001800 nid=0xc30 waiting on condition [0x00007f0a577f5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1099) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2040) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1732) - locked <0x00000000959ae1d0> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1158) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:656) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:959) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:900) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:888) at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:854) at org.apache.ignite.internal.processors.cache.IgniteClusterActivateDeactivateTest.lambda$testConcurrentJoinAndActivate$4(IgniteClusterActivateDeactivateTest.java:601) at org.apache.ignite.internal.processors.cache.IgniteClusterActivateDeactivateTest$$Lambda$183/933337479.call(Unknown Source) at org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:84) ... "grid-nio-worker-tcp-comm-3-#11059%cache.IgniteClusterActivateDeactivateTest5%" #13297 prio=5 os_prio=0 tid=0x00007f08f809f000 nid=0xc83 waiting on condition [0x00007f0a4688d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000095e1f5c0> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.ignite.internal.util.IgniteUtils.await(IgniteUtils.java:7470) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.getSpiContext(TcpCommunicationSpi.java:2268) at org.apache.ignite.spi.IgniteSpiAdapter.getLocalNode(IgniteSpiAdapter.java:155) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeLocalNodeId(TcpCommunicationSpi.java:4016) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.nodeIdMessage(TcpCommunicationSpi.java:4009) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$300(TcpCommunicationSpi.java:271) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onConnected(TcpCommunicationSpi.java:415) at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionOpened(GridNioFilterChain.java:251) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionOpened(GridNioCodecFilter.java:67) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) at org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionOpened(GridConnectionBytesVerifyFilter.java:58) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionOpened(GridNioServer.java:3503) at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionOpened(GridNioFilterChain.java:139) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2616) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1974) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1795) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748)
The reason for the hang is that the joining node is waiting for the state transition, but one of the existing nodes cannot complete the exchange because it cannot establish a connection with new node because local node ID is not available yet.
Separate question is why we need to wait for the SPI context initialization to obtain the local node ID when the local node ID is generated long before components start.
Attachments
Issue Links
- relates to
-
IGNITE-10495 Get rid of local node ID in IgniteConfiguration
-
- Open
-
- links to