[CASSANDRA-19607] Compaction double reads every chunk - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Duplicate
Fix Version/s: None
Component/s: None
Labels:
None

Platform:

All
Impacts:

None

Description

I was taking a look at the I/O operations performed during compaction and noticed it looks like we're double reading every chunk of the filesystem, meaning we perform 2x the system calls that we need to. The second read will come from page cache, but this isn't free, so we should try to avoid it. Here's the steps to reproduce.

First, wrote a small amount of data to a table and flush. I have a single SSTable that's 151KB here:

-rw-r--r-- 1 cassandra  127 May  1 21:00 da-8-bti-CompressionInfo.db
-rw-r--r-- 1 cassandra 151K May  1 21:00 da-8-bti-Data.db
-rw-r--r-- 1 cassandra   10 May  1 21:00 da-8-bti-Digest.crc32
-rw-r--r-- 1 cassandra  136 May  1 21:00 da-8-bti-Filter.db
-rw-r--r-- 1 cassandra  829 May  1 21:00 da-8-bti-Partitions.db
-rw-r--r-- 1 cassandra    0 May  1 21:00 da-8-bti-Rows.db
-rw-r--r-- 1 cassandra 5.0K May  1 21:00 da-8-bti-Statistics.db
-rw-r--r-- 1 cassandra   94 May  1 21:00 da-8-bti-TOC.txt

Then watch all reads by doing the following:

$ xfsslower -p $PID 0

Next, start a event profiling session on ChannelProxy.read by doing the following with async-profiler:

$ asprof -e org.apache.cassandra.io.util.ChannelProxy.read -d 10 $(cassandra-pid) -f /mnt/cassandra/artifacts/channelproxy-read.txt

Then I started a compaction on the table which lasted roughly a second.

I took the log of filesystem operations and pulled out all the accesses to `da-8-bti-Data.db` using ripgrep:

$ rg da-8-bti-Data.db artifacts/cassandra1/xfsslower-da8.txt

The headings are:

TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME

The data:

27:21:07:05 CompactionExec 30550  R 16387   0           0.02 da-8-bti-Data.db
28:21:07:05 CompactionExec 30550  R 16387   0           0.00 da-8-bti-Data.db
40:21:07:05 CompactionExec 30550  R 16436   16          0.01 da-8-bti-Data.db
41:21:07:05 CompactionExec 30550  R 16436   16          0.01 da-8-bti-Data.db
44:21:07:05 CompactionExec 30550  R 16419   32          0.01 da-8-bti-Data.db
45:21:07:05 CompactionExec 30550  R 16419   32          0.01 da-8-bti-Data.db
48:21:07:05 CompactionExec 30550  R 16441   48          0.01 da-8-bti-Data.db
49:21:07:05 CompactionExec 30550  R 16441   48          0.00 da-8-bti-Data.db
52:21:07:05 CompactionExec 30550  R 16421   64          0.01 da-8-bti-Data.db
53:21:07:05 CompactionExec 30550  R 16421   64          0.01 da-8-bti-Data.db
56:21:07:05 CompactionExec 30550  R 16415   80          0.01 da-8-bti-Data.db
57:21:07:05 CompactionExec 30550  R 16415   80          0.01 da-8-bti-Data.db
60:21:07:05 CompactionExec 30550  R 16439   96          0.01 da-8-bti-Data.db
61:21:07:05 CompactionExec 30550  R 16439   96          0.00 da-8-bti-Data.db
64:21:07:05 CompactionExec 30550  R 16390   112         0.01 da-8-bti-Data.db
65:21:07:05 CompactionExec 30550  R 16390   112         0.00 da-8-bti-Data.db
68:21:07:05 CompactionExec 30550  R 16384   128         0.01 da-8-bti-Data.db
69:21:07:05 CompactionExec 30550  R 16384   128         0.01 da-8-bti-Data.db
72:21:07:05 CompactionExec 30550  R 6019    144         0.00 da-8-bti-Data.db
73:21:07:05 CompactionExec 30550  R 6019    144         0.00 da-8-bti-Data.db

There are 20 IO operations reported here, each of which occurs twice. I was originally thinking this might be a bug with the reporting, but other files operations are listed only once. For example:

21:07:05 CompactionExec 30550  W 16386   112         0.09 da-9-bti-Data.db
21:07:05 CompactionExec 30550  W 4       128         0.00 da-9-bti-Data.db

Looking at the output, the last read is approximately 6KB and starts at the 144KB offset which matches up to the total length of the file.

The profiler output indicates there's 2 code paths that arrive at the FS operation 9 times.

I've also attached the profiler output, which shows 22 calls to org.apache.cassandra.io.util.ChannelProxy.read. Two of these originate in the Stats file, leaving 9 chunks in this file, read twice, plus 2 additional calls.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

channelproxy-read-da8.txt
01/May/24 21:28
14 kB
Jon Haddad
channelproxy-read-da9-rev.html
01/May/24 21:28
11 kB
Jon Haddad

Issue Links

duplicates

CASSANDRA-15452 Improve disk access patterns during compaction (big format)

Review In Progress

Compaction double reads every chunk

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates