[CASSANDRA-11623] Compactions w/ Short Rows Spending Time in getOnDiskFilePointer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 3.8
Component/s: None
Labels:
None

Description

Been doing some performance tuning and profiling of my cassandra cluster and noticed that compaction speeds for my tables that I know to have very short rows were going particularly slowly. Profiling shows a ton of time being spent in BigTableWriter.getOnDiskFilePointer(), and attaching strace to a CompactionTask shows that a majority of time is being spent lseek (called by getOnDiskFilePointer), and not read or write.

Going deeper it looks like we call getOnDiskFilePointer each row (sometimes multiple times per row) in order to see if we've reached our expected sstable size and should start a new writer. This is pretty unnecessary.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

compactiontask_profile.png
20/Apr/16 18:33
339 kB
Tom Petracca

Issue Links

is related to

CASSANDRA-11697 Improve Compaction Throughput

Open

Activity

People

Assignee:: Tom Petracca

Reporter:: Tom Petracca

Authors:: Tom Petracca

Reviewers:: Marcus Eriksson

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 20/Apr/16 18:33

Updated:: 16/Apr/19 09:30

Resolved:: 11/May/16 08:47