Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-19429

Remove lock contention generated by getCapacity function in SSTableReader

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Changes Suggested
    • Normal
    • Resolution: Unresolved
    • 4.0.x, 4.1.x
    • Local/SSTable
    • None
    • Degradation - Performance Bug/Regression
    • Normal
    • Normal
    • User Report
    • All
    • None
    • Hide

      ci

      Show
      ci

    Description

      Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock acquires is measured in the `getCapacity` function from `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), this limits the CPU utilization of the system to under 50% when testing at full load and therefore limits the achieved throughput.

      Removing the lock contention from the SSTableReader.java file by replacing the call to `getCapacity` with `size` achieves up to 2.95x increase in throughput on r8g.24xlarge and 2x on r7i.24xlarge:

      Instance type Cass 4.1.3 Cass 4.1.3 patched
      r8g.24xlarge 168k ops 496k ops (2.95x)
      r7i.24xlarge 153k ops 304k ops (1.98x)

       

      Instructions to reproduce:

      ## Requirements for Ubuntu 22.04
      sudo apt install -y ant git openjdk-11-jdk
      
      ## Build and run
      CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f -R
      
      # Run
      bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
      bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
      bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && \
      bin/nodetool compact keyspace1   && sleep 30s && \
      tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=406 -node localhost -log file=result.log -graph file=graph.html
      

      Attachments

        1. asprof_cass4.1.3__lock_20240216052912lock.html
          10 kB
          Dipietro Salvatore
        2. Screenshot 2024-02-26 at 10.27.10.png
          191 kB
          Dipietro Salvatore
        3. Screenshot 2024-02-27 at 11.29.41.png
          193 kB
          Dipietro Salvatore
        4. image-2024-03-08-15-51-30-439.png
          624 kB
          Jon Haddad
        5. image-2024-03-08-15-52-07-902.png
          657 kB
          Jon Haddad
        6. Screenshot 2024-03-19 at 15.22.50.png
          207 kB
          Dipietro Salvatore

        Activity

          People

            dipiets Dipietro Salvatore
            dipiets Dipietro Salvatore
            Dipietro Salvatore
            Maxim Muzafarov, Stefan Miklosovic
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 50m
                3h 50m