[CASSANDRA-10229] Fix cassandra-stress gaussian behaviour for shuffling the distribution, to mitigate read perf after a major compaction - ASF JIRA

Details

Type: Improvement
Status: Open
Priority: Low
Resolution: Unresolved
Fix Version/s: None
Component/s: Tool/stress
Labels:
- perfomance
- stress

Description

TITLE WAS: BAD READ PERFORMANCE AFTER A MAJOR COMPACTION

I am trying to understand what I am seeing. My scenario is very basic, it's a simple users table with key cache and row cache disabled. I write 50M then read 5M random elements. The read performance is not that bad BEFORE a major compaction of the data. I see a ~3x performance regression AFTER I run a major compaction.

Here's the read performance numbers for my scenario:

3.0 before a major compaction (Key cache and row cache disabled), note that this is the numbers from 50M,  I see the same with 5M
==================================================================================

Results:
op rate                   : 9149 [read:9149]
partition rate            : 9149 [read:9149]
row rate                  : 9149 [read:9149]
latency mean              : 32.8 [read:32.8]
latency median            : 31.2 [read:31.2]
latency 95th percentile   : 47.2 [read:47.2]
latency 99th percentile   : 55.0 [read:55.0]
latency 99.9th percentile : 66.4 [read:66.4]
latency max               : 305.4 [read:305.4]
Total partitions          : 50000000 [read:50000000]
Total errors              : 0 [read:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 01:31:05
END

-rw-rw-r-- 1 aboudreault aboudreault  4.7G Aug 31 08:51 ma-1024-big-Data.db
-rw-rw-r-- 1 aboudreault aboudreault  4.9G Aug 31 09:08 ma-1077-big-Data.db

3.0 after a major compaction (Key cache and row cache disabled), note that this is the numbers from 50M, I see the same with 5M
================================================================================

Results:
op rate                   : 3275 [read:3275]
partition rate            : 3275 [read:3275]
row rate                  : 3275 [read:3275]
latency mean              : 91.6 [read:91.6]
latency median            : 88.8 [read:88.8]
latency 95th percentile   : 107.2 [read:107.2]
latency 99th percentile   : 116.0 [read:116.0]
latency 99.9th percentile : 125.5 [read:125.5]
latency max               : 249.0 [read:249.0]
Total partitions          : 50000000 [read:50000000]
Total errors              : 0 [read:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 04:14:26
END

-rw-rw-r-- 1 aboudreault aboudreault 9.5G Aug 31 09:40 ma-1079-big-Data.db

2.1 before major compaction (Key cache and row cache disabled)
==============================================================

Results:
op rate                   : 21348 [read:21348]
partition rate            : 21348 [read:21348]
row rate                  : 21348 [read:21348]
latency mean              : 14.1 [read:14.1]
latency median            : 8.0 [read:8.0]
latency 95th percentile   : 38.5 [read:38.5]
latency 99th percentile   : 60.8 [read:60.8]
latency 99.9th percentile : 99.2 [read:99.2]
latency max               : 229.2 [read:229.2]
Total partitions          : 5000000 [read:5000000]
Total errors              : 0 [read:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:03:54
END

2.1 after major compaction (Key cache and row cache disabled)
=============================================================

Results:
op rate                   : 5262 [read:5262]
partition rate            : 5262 [read:5262]
row rate                  : 5262 [read:5262]
latency mean              : 57.0 [read:57.0]
latency median            : 55.5 [read:55.5]
latency 95th percentile   : 69.4 [read:69.4]
latency 99th percentile   : 83.3 [read:83.3]
latency 99.9th percentile : 197.4 [read:197.4]
latency max               : 1169.0 [read:1169.0]
Total partitions          : 5000000 [read:5000000]
Total errors              : 0 [read:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:15:50
END

I can reproduce that read performance regression on EC2 and locally. To reproduce:

1. Launch a 1 node cluster (2.1, 2.2 or 3.0)
2. Set the compaction thoughput at 0. (need a restart IIRC)
3. Write 50M elements (so we get the same sstable size for the test). The yaml profile is attached in this ticket. Ensure you are using stress from apache/cassandra-3.0, trunk is broken at the moment.

cassandra-stress user profile=`pwd`/users-caching.yaml ops\(insert=1\) n=50M -rate threads=100

4. Flush the data and wait for the auto-compaction to finish. You should get around 2-6 sstables when it's done.
5. Restart Cassandra
6. Read 5M elements

cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M -rate threads=300

7. Restart C*, then start a major compaction and wait it's finish.

ccm stop && ccm start
ccm nodetool compact

8. Read 5M elements

cassandra-stress user profile=/path/to/users-caching.yaml ops\(read=1\) n=5M -rate threads=300

Fix cassandra-stress gaussian behaviour for shuffling the distribution, to mitigate read perf after a major compaction

Details

Description

Attachments

Attachments

Activity

People

Dates