Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
org.apache.james.mailbox.cassandra.CassandraMailboxManagerTest$WithBatchSize.creatingConcurrentlyMailboxesWithSameParentShouldNotFail
tests is enough to trigger instability on the Apache CI
https://ci-builds.apache.org/job/james/job/ApacheJames/job/PR-685/1/
Error Message java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Stacktrace java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Caused by: java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) Standard Output 11:29:54.751 [ERROR] o.a.j.u.c.ConcurrentTestRunner - Error caught during concurrent testing (iteration 0, threadNumber 1) com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:90) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:65) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268) at com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) ... 25 common frames omitted Wrapped by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded)
In short, the LWT usage is enough to create contention.
Looking closer at the issue, StoreMailboxManager does numerous defensive SERIAL reads (doing empty paxos commits) which ends up further degrading performance and increase contention.
I believe removing these defensive reads would make our code more stable.
It resulted in faster (x2) test for gConcurrentlyMailboxesWithSameParentShouldNotFail
Attachments
Issue Links
- links to