[IGNITE-19419] Internal queue overload under intensive SQL load - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.0-beta1
Fix Version/s: None
Component/s: sql
Labels:
- ignite-3
- storage-engine

Ignite Flags:

Docs Required, Release Notes Required

Description

I tried to run a SQL benchmark (slightly modified Benchbase TPC-C scenario) against Ignite3 (commit hash b48ddcba7cd2bd3b9a053ae131c25b44a0400e27).

Ignite3 was running in a single-node mode with -Xmx8G -Xms8G memory limit (increased memory size comparing to ~~IGNITE-19412~~). After some time it has become unavailable to the client. Log is filled with error messages like the following:

2023-05-03 17:51:57:705 +0300 [WARNING][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][NodeImpl] Node <e92d1429-965a-4cbd-af37-126844f352ca_part_18/node0> got error: Error [type=ERROR_TYPE_LOG, status=Status[EBUSY<1009>: LogManager is busy, disk queue overload.]].
2023-05-03 17:51:57:705 +0300 [WARNING][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error during the request type=ActionRequestImpl occurred (will be retried on the randomly selected node): 
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
	at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
	at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	... 7 more
2023-05-03 17:51:57:705 +0300 [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node <e92d1429-965a-4cbd-af37-126844f352ca_part_5/node0> append [2268, 2268] failed, status=Status[EIO<1014>: Corrupted LogStorage].
2023-05-03 17:51:57:705 +0300 [WARNING][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][FSMCallerImpl] FSMCaller already in error status, ignore new error
Error [type=ERROR_TYPE_LOG, status=Status[EBUSY<1009>: LogManager is busy, disk queue overload.]]
	at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:587)
	at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.appendEntries(LogManagerImpl.java:337)
	at org.apache.ignite.raft.jraft.core.NodeImpl.executeApplyingTasks(NodeImpl.java:1621)
	at org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:305)
	at org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:288)
	at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:217)
	at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:181)
	at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
	at java.base/java.lang.Thread.run(Thread.java:829)
2023-05-03 17:51:57:706 +0300 [INFO][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][ReplicatorGroupImpl] Fail to find the next candidate.
2023-05-03 17:51:57:706 +0300 [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node <e92d1429-965a-4cbd-af37-126844f352ca_part_18/node0> append [2271, 2271] failed, status=Status[EIO<1014>: Corrupted LogStorage].
2023-05-03 17:51:57:706 +0300 [INFO][%node0%JRaft-FSMCaller-Disruptor-_stripe_10-0][StateMachineAdapter] onLeaderStop: status=Status[EBADNODE<10009>: Raft node(leader or candidate) is in error.].
2023-05-03 17:51:57:706 +0300 [ERROR][%node0%JRaft-LogManager-Disruptor-_stripe_10-0][NodeImpl] Node <30181391-2f04-40c0-8c35-a77592b0731f_part_5/node0> append [2285, 2285] failed, status=Status[EIO<1014>: Corrupted LogStorage].

More errors could be observed in the attached log archive.

Actually the log storage is not corrupted but the queue is overloaded. There are some performance improvements in main JRaft repository ( https://github.com/sofastack/sofa-jraft ) that may fix that, more exactly it is commit 5de2fbbcabb70ddbefb06a1d3737821781c3e85c in sofa-jraft repo.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ai3.jstack
04/May/23 12:56
383 kB
Andrey Khitrin
logs.zip
04/May/23 12:56
1.28 MB
Andrey Khitrin

Issue Links

is a clone of

IGNITE-19412 Potential memory leak in MailboxRegistryImpl under intensive SQL load

Resolved

relates to

IGNITE-19275 Creation of big amount of tables throws exception

Resolved

Internal queue overload under intensive SQL load

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates