Details
-
Bug
-
Status: Changes Suggested
-
Normal
-
Resolution: Unresolved
-
None
-
Degradation - Performance Bug/Regression
-
Normal
-
Normal
-
User Report
-
All
-
None
-
Description
Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock acquires is measured in the `getCapacity` function from `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), this limits the CPU utilization of the system to under 50% when testing at full load and therefore limits the achieved throughput.
Removing the lock contention from the SSTableReader.java file by replacing the call to `getCapacity` with `size` achieves up to 2.95x increase in throughput on r8g.24xlarge and 2x on r7i.24xlarge:
Instance type | Cass 4.1.3 | Cass 4.1.3 patched |
r8g.24xlarge | 168k ops | 496k ops (2.95x) |
r7i.24xlarge | 153k ops | 304k ops (1.98x) |
Instructions to reproduce:
## Requirements for Ubuntu 22.04 sudo apt install -y ant git openjdk-11-jdk ## Build and run CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f -R # Run bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && \ bin/nodetool compact keyspace1 && sleep 30s && \ tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=406 -node localhost -log file=result.log -graph file=graph.html