Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-15154

Suddenly Integration Tests suite hangs on shutdown RAFT node

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha3
    • None

    Description

      This is an example of this freezing:
      https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_IntegrationTests/6087991

      [20:15:22]W:             [org.apache.ignite:ignite-raft] 2021-07-15 20:15:07:183 +0300 [main] ERROR rejectedExecution - Failed to submit a listener notification task. Event loop shut down?
      [20:15:22]W:             [org.apache.ignite:ignite-raft] java.util.concurrent.RejectedExecutionException: event executor terminated
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:842)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:499)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:184)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at org.apache.ignite.internal.network.netty.NettyUtils.toCompletableFuture(NettyUtils.java:46)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at org.apache.ignite.internal.network.netty.NettyUtils.toCompletableFuture(NettyUtils.java:66)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at org.apache.ignite.internal.network.netty.NettyClient.lambda$stop$1(NettyClient.java:171)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:946)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2266)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at org.apache.ignite.internal.network.netty.NettyClient.stop(NettyClient.java:168)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3605)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:550)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:517)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at org.apache.ignite.internal.network.netty.ConnectionManager.stop(ConnectionManager.java:240)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at org.apache.ignite.network.scalecube.ScaleCubeClusterServiceFactory$2.shutdown(ScaleCubeClusterServiceFactory.java:114)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at org.apache.ignite.raft.server.ITJRaftCounterServerTest$1.shutdown(ITJRaftCounterServerTest.java:167)
      [20:15:22]W:             [org.apache.ignite:ignite-raft]        at org.apache.ignite.raft.server.ITJRaftCounterServerTest.after(ITJRaftCounterServerTest.java:149)
      

      Root cause of this issue is a shutdown procedure in the test class.
      Some test might shutdown a server or client but forget about remove from appropriate collection (servers or clients). After this test all started node will be shouted down (got through the collections), but if one the node is already stopped the code received exception and frozen.
      This case can be fixed if a node will be automatically removed from node's collection during the shutdown.

      Attachments

        Issue Links

          Activity

            People

              v.pyatkov Vladislav Pyatkov
              v.pyatkov Vladislav Pyatkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m