Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10229

Fix cassandra-stress gaussian behaviour for shuffling the distribution, to mitigate read perf after a major compaction

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Low
    • Resolution: Unresolved
    • None
    • Tool/stress

    Description

      TITLE WAS: BAD READ PERFORMANCE AFTER A MAJOR COMPACTION

      I am trying to understand what I am seeing. My scenario is very basic, it's a simple users table with key cache and row cache disabled. I write 50M then read 5M random elements. The read performance is not that bad BEFORE a major compaction of the data. I see a ~3x performance regression AFTER I run a major compaction.

      Here's the read performance numbers for my scenario:

      3.0 before a major compaction (Key cache and row cache disabled), note that this is the numbers from 50M,  I see the same with 5M
      ==================================================================================
      
      Results:
      op rate                   : 9149 [read:9149]
      partition rate            : 9149 [read:9149]
      row rate                  : 9149 [read:9149]
      latency mean              : 32.8 [read:32.8]
      latency median            : 31.2 [read:31.2]
      latency 95th percentile   : 47.2 [read:47.2]
      latency 99th percentile   : 55.0 [read:55.0]
      latency 99.9th percentile : 66.4 [read:66.4]
      latency max               : 305.4 [read:305.4]
      Total partitions          : 50000000 [read:50000000]
      Total errors              : 0 [read:0]
      total gc count            : 0
      total gc mb               : 0
      total gc time (s)         : 0
      avg gc time(ms)           : NaN
      stdev gc time(ms)         : 0
      Total operation time      : 01:31:05
      END
      
      -rw-rw-r-- 1 aboudreault aboudreault  4.7G Aug 31 08:51 ma-1024-big-Data.db
      -rw-rw-r-- 1 aboudreault aboudreault  4.9G Aug 31 09:08 ma-1077-big-Data.db
      
      3.0 after a major compaction (Key cache and row cache disabled), note that this is the numbers from 50M, I see the same with 5M
      ================================================================================
      
      Results:
      op rate                   : 3275 [read:3275]
      partition rate            : 3275 [read:3275]
      row rate                  : 3275 [read:3275]
      latency mean              : 91.6 [read:91.6]
      latency median            : 88.8 [read:88.8]
      latency 95th percentile   : 107.2 [read:107.2]
      latency 99th percentile   : 116.0 [read:116.0]
      latency 99.9th percentile : 125.5 [read:125.5]
      latency max               : 249.0 [read:249.0]
      Total partitions          : 50000000 [read:50000000]
      Total errors              : 0 [read:0]
      total gc count            : 0
      total gc mb               : 0
      total gc time (s)         : 0
      avg gc time(ms)           : NaN
      stdev gc time(ms)         : 0
      Total operation time      : 04:14:26
      END
      
      -rw-rw-r-- 1 aboudreault aboudreault 9.5G Aug 31 09:40 ma-1079-big-Data.db
      
      2.1 before major compaction (Key cache and row cache disabled)
      ==============================================================
      
      Results:
      op rate                   : 21348 [read:21348]
      partition rate            : 21348 [read:21348]
      row rate                  : 21348 [read:21348]
      latency mean              : 14.1 [read:14.1]
      latency median            : 8.0 [read:8.0]
      latency 95th percentile   : 38.5 [read:38.5]
      latency 99th percentile   : 60.8 [read:60.8]
      latency 99.9th percentile : 99.2 [read:99.2]
      latency max               : 229.2 [read:229.2]
      Total partitions          : 5000000 [read:5000000]
      Total errors              : 0 [read:0]
      total gc count            : 0
      total gc mb               : 0
      total gc time (s)         : 0
      avg gc time(ms)           : NaN
      stdev gc time(ms)         : 0
      Total operation time      : 00:03:54
      END
      
      2.1 after major compaction (Key cache and row cache disabled)
      =============================================================
      
      Results:
      op rate                   : 5262 [read:5262]
      partition rate            : 5262 [read:5262]
      row rate                  : 5262 [read:5262]
      latency mean              : 57.0 [read:57.0]
      latency median            : 55.5 [read:55.5]
      latency 95th percentile   : 69.4 [read:69.4]
      latency 99th percentile   : 83.3 [read:83.3]
      latency 99.9th percentile : 197.4 [read:197.4]
      latency max               : 1169.0 [read:1169.0]
      Total partitions          : 5000000 [read:5000000]
      Total errors              : 0 [read:0]
      total gc count            : 0
      total gc mb               : 0
      total gc time (s)         : 0
      avg gc time(ms)           : NaN
      stdev gc time(ms)         : 0
      Total operation time      : 00:15:50
      END
      

      I can reproduce that read performance regression on EC2 and locally. To reproduce:

      1. Launch a 1 node cluster (2.1, 2.2 or 3.0)
      2. Set the compaction thoughput at 0. (need a restart IIRC)
      3. Write 50M elements (so we get the same sstable size for the test). The yaml profile is attached in this ticket. Ensure you are using stress from apache/cassandra-3.0, trunk is broken at the moment.

      cassandra-stress user profile=`pwd`/users-caching.yaml ops\(insert=1\) n=50M -rate threads=100
      

      4. Flush the data and wait for the auto-compaction to finish. You should get around 2-6 sstables when it's done.
      5. Restart Cassandra
      6. Read 5M elements

      cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M -rate threads=300
      

      7. Restart C*, then start a major compaction and wait it's finish.

      ccm stop && ccm start
      ccm nodetool compact
      

      8. Read 5M elements

      cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M -rate threads=300
      

      Attachments

        1. users-caching.yaml
          0.8 kB
          Alan Boudreault

        Activity

          People

            Unassigned Unassigned
            aboudreault Alan Boudreault
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: