Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-8785

Node may hang indefinitely in CONNECTING state during cluster segmentation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.5
    • None
    • cache
    • None

    Description

      Affected test: org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest#testTopologyValidatorWithCacheGroup

      Node hangs with following stacktrace:

      "grid-starter-testTopologyValidatorWithCacheGroup-22" #117619 prio=5 os_prio=0 tid=0x00007f17dd19b800 nid=0x304a in Object.wait() [0x00007f16b19df000]
         java.lang.Thread.State: TIMED_WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:931)
      	- locked <0x0000000705ee4a60> (a java.lang.Object)
      	at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:373)
      	at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1948)
      	at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:915)
      	at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1739)
      	at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1046)
      	at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014)
      	at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723)
      	- locked <0x0000000705995ec0> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
      	at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151)
      	at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649)
      	at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:882)
      	at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:845)
      	at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:833)
      	at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:799)
      	at org.apache.ignite.testframework.junits.GridAbstractTest$3.call(GridAbstractTest.java:742)
      	at org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
      

      It seems that node never receives acknowledgment from coordinator.

      There were some failure before:

      [org.apache.ignite:ignite-core] [2018-06-10 04:59:18,876][WARN ][grid-starter-testTopologyValidatorWithCacheGroup-22][IgniteCacheTopologySplitAbstractTest$SplitTcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jokser Pavel Kovalenko
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: