Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-7944

Disconnected client node tries to send JOB_CANCEL message

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.9, 2.3
    • 2.5
    • messaging
    • None

    Description

      In case the network is blocked (socket connections not closed) and failure is detected, tcp-client-disco-msg-worker thread can be stuck in process of TcpClient creating:

      "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 os_prio=0 tid=0x00007f94c067c800 nid=0x2bdf runnable [0x00007f960ecf1000]
      java.lang.Thread.State: RUNNABLE
      at sun.nio.ch.Net.poll(Native Method)
      at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
      - locked <0x00007fa140f520c0> (a java.lang.Object)
      at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
      - locked <0x00007fa140f520b0> (a java.lang.Object)
      at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
      at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
      at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
      at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
      at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
      at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
      at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
      at org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
      at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
      at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
      at org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
      at org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
      at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
      at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
      at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
      at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
      at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
      

      It looks like msg-worker is trying to send JOB_CANCEL message for each job with timeout equals failureDetectionTimeout.

      Reproducer is attached.

      Attachments

        1. Reproducer7944.java
          10 kB
          Roman Guseinov

        Issue Links

          Activity

            People

              guseinov Roman Guseinov
              guseinov Roman Guseinov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: