Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.0-beta1
-
None
-
Docs Required, Release Notes Required
Description
I tried to run a SQL benchmark (slightly modified Benchbase TPC-C scenario) against Ignite3 (commit hash b48ddcba7cd2bd3b9a053ae131c25b44a0400e27).
Ignite3 was running in a single-node mode with -Xmx8G -Xms8G memory limit (increased memory size comparing to IGNITE-19412). After some time it has become unavailable to the client. Log is filled with error messages like the following:
2023-05-03 17:51:57:705 +0300 [WARNING][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][NodeImpl] Node <e92d1429-965a-4cbd-af37-126844f352ca_part_18/node0> got error: Error [type=ERROR_TYPE_LOG, status=Status[EBUSY<1009>: LogManager is busy, disk queue overload.]]. 2023-05-03 17:51:57:705 +0300 [WARNING][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error during the request type=ActionRequestImpl occurred (will be retried on the randomly selected node): java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.util.concurrent.TimeoutException ... 7 more 2023-05-03 17:51:57:705 +0300 [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node <e92d1429-965a-4cbd-af37-126844f352ca_part_5/node0> append [2268, 2268] failed, status=Status[EIO<1014>: Corrupted LogStorage]. 2023-05-03 17:51:57:705 +0300 [WARNING][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][FSMCallerImpl] FSMCaller already in error status, ignore new error Error [type=ERROR_TYPE_LOG, status=Status[EBUSY<1009>: LogManager is busy, disk queue overload.]] at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:587) at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.appendEntries(LogManagerImpl.java:337) at org.apache.ignite.raft.jraft.core.NodeImpl.executeApplyingTasks(NodeImpl.java:1621) at org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:305) at org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:288) at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:217) at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:181) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) at java.base/java.lang.Thread.run(Thread.java:829) 2023-05-03 17:51:57:706 +0300 [INFO][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][ReplicatorGroupImpl] Fail to find the next candidate. 2023-05-03 17:51:57:706 +0300 [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node <e92d1429-965a-4cbd-af37-126844f352ca_part_18/node0> append [2271, 2271] failed, status=Status[EIO<1014>: Corrupted LogStorage]. 2023-05-03 17:51:57:706 +0300 [INFO][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][StateMachineAdapter] onLeaderStop: status=Status[EBADNODE<10009>: Raft node(leader or candidate) is in error.]. 2023-05-03 17:51:57:706 +0300 [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node <30181391-2f04-40c0-8c35-a77592b0731f_part_5/node0> append [2285, 2285] failed, status=Status[EIO<1014>: Corrupted LogStorage].
More errors could be observed in the attached log archive.
Actually the log storage is not corrupted but the queue is overloaded. There are some performance improvements in main JRaft repository ( https://github.com/sofastack/sofa-jraft ) that may fix that, more exactly it is commit 5de2fbbcabb70ddbefb06a1d3737821781c3e85c in sofa-jraft repo.
Attachments
Attachments
Issue Links
- is a clone of
-
IGNITE-19412 Potential memory leak in MailboxRegistryImpl under intensive SQL load
- Resolved
- relates to
-
IGNITE-19275 Creation of big amount of tables throws exception
- Resolved