Uploaded image for project: 'James Server'
  1. James Server
  2. JAMES-3660

Cassandra mailbox creation unstable when high concurency

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.7.0
    • None
    • None

    Description

      org.apache.james.mailbox.cassandra.CassandraMailboxManagerTest$WithBatchSize.creatingConcurrentlyMailboxesWithSameParentShouldNotFail

      tests is enough to trigger instability on the Apache CI

      https://ci-builds.apache.org/job/james/job/ApacheJames/job/PR-685/1/

      Error Message
      
      java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded)
      
      Stacktrace
      
      java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded)
      Caused by: java.lang.RuntimeException: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded)
      Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded)
      Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded)
      
      Standard Output
      
      11:29:54.751 [ERROR] o.a.j.u.c.ConcurrentTestRunner - Error caught during concurrent testing (iteration 0, threadNumber 1)
      com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded)
      	at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:90)
      	at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:65)
      	at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297)
      	at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268)
      	at com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
      	... 25 common frames omitted
      Wrapped by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency SERIAL (1 responses were required but only 0 replica responded)
      

      In short, the LWT usage is enough to create contention.

      Looking closer at the issue, StoreMailboxManager does numerous defensive SERIAL reads (doing empty paxos commits) which ends up further degrading performance and increase contention.

      I believe removing these defensive reads would make our code more stable.

      It resulted in faster (x2) test for gConcurrentlyMailboxesWithSameParentShouldNotFail

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              btellier Benoit Tellier
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m