Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-7476

Server node will join with failure gathering metrics

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.5
    • Component/s: None
    • Labels:
      None

      Description

      Sometimes server node will fail with the following trace:

      SEVERE: TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order to prevent cluster wide instability.
      java.lang.NullPointerException
          at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1149)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5022)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2690)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2491)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6675)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2574)
          at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)

      Two problems here:

      • Uncaught exception in cacheMetrics() leads to unconditional failure of node, because it happens to be in discovery thread. Should probably wrap all non-trivial code include try-catch.
      • Lack of proper locking when destroying cache (see also IGNITE-6580IGNITE-7278 and IGNITE-7165)

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ilyak Ilya Kasnacheev
                Reporter:
                ilyak Ilya Kasnacheev
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: