Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-13145

NPE on node stop can cause JVM halt with default failure handler

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9
    • None
    • None

    Description

      TcpCommunicationSpi have injected @IgniteLogger resource.  TcpCommunicationSpi.sendMessage0 tries to access this resource without null check, but when the node is stopping injected resources are cleaned up. If some threads are trying to send something via communication SPI on stopping node, NPE is thrown with subsequent failure handler call and JVM halt with default failure handler. This is undesirable when Ignite is running in embedded mode since all other JVM applications will be stopped.

      Error stack:

      [11:34:39,194][SEVERE][sys-stripe-11-#8570%280332bb-89f3-4227-9d14-c319bbf3ca0c%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.NullPointerException]]
      java.lang.NullPointerException
          at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2898)
          at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2882)
          at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
          at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
          at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)
          at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1296)
          at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.sendDeferredUpdateResponse(GridDhtAtomicCache.java:3643)
          at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout.run(GridDhtAtomicCache.java:3889)
          at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:565)
          at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
          at java.base/java.lang.Thread.run(Thread.java:834)
      [11:34:39,214][SEVERE][sys-stripe-11-#8570%280332bb-89f3-4227-9d14-c319bbf3ca0c%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.NullPointerException]]

       This problem occurs often on team-city in "Java thin client" suite (ReliabilityTest#testFailover).

      Attachments

        Issue Links

          Activity

            People

              alex_pl Aleksey Plekhanov
              alex_pl Aleksey Plekhanov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m