[CASSANDRA-1991] CFS.maybeSwitchMemtable() calls CommitLog.instance.getContext(), which may block, under flusher lock write lock - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 1.1.1
Component/s: None
Labels:
- commitlog

Description

While investigate ~~CASSANDRA-1955~~ I realized I was seeing very poor latencies for reasons that had nothing to do with flush_writers, even when using periodic commit log mode (and flush writers set ridiculously high, 500).

It turns out writes blocked were slow because Table.apply() was spending lots of time (I can easily trigger seconds on moderate work-load) trying to acquire a flusher lock read lock ("flush lock millis" log printout in the logging patch I'll attach).

That in turns is caused by CFS.maybeSwitchMemtable() which acquires the flusher lock write lock.

Bisecting further revealed that the offending line of code that blocked was:

final CommitLogSegment.CommitLogContext ctx = writeCommitLog ? CommitLog.instance.getContext() : null;

Indeed, CommitLog.getContext() simply returns currentSegment().getContext(), but does so by submitting a callable on the service executor. So independently of flush writers, this can block all (global, for all cf:s) writes very easily, and does.

I'll attach a file that is an independent Python script that triggers it on my macos laptop (with an intel SSD, which is why I was particularly surprised) (it assumes CPython, out-of-the-box-or-almost Cassandra on localhost that isn't in a cluster, and it will drop/recreate a keyspace called '1955').

I'm also attaching, just FYI, the patch with log entries that I used while tracking it down.

Finally, I'll attach a patch with a suggested solution of keeping track of the latest commit log with an AtomicReference (as an alternative to synchronizing all access to segments). With that patch applied, latencies are not affected by my trigger case like they were before. There are some sub-optimal > 100 ms cases on my test machine, but for other reasons. I'm no longer able to trigger the extremes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1991.txt
09/May/12 15:57
9 kB
Jonathan Ellis
1991-v9.txt
18/Jan/11 18:50
15 kB
Peter Schuller
1991-v8.txt
18/Jan/11 01:19
15 kB
Peter Schuller
1991-v7.txt
17/Jan/11 19:21
12 kB
Peter Schuller
1991-v6.txt
17/Jan/11 01:53
13 kB
Peter Schuller
1991-v5.txt
16/Jan/11 23:28
12 kB
Jonathan Ellis
1991-checkpointing-flush.txt
16/Jan/11 13:59
17 kB
Peter Schuller
1991-v4.txt
16/Jan/11 04:00
7 kB
Jonathan Ellis
1991-v3.txt
15/Jan/11 20:59
3 kB
Jonathan Ellis
1991-trunk-v2.txt
15/Jan/11 19:35
4 kB
Peter Schuller
1991-trunk.txt
15/Jan/11 18:35
3 kB
Peter Schuller
1991-logchanges.txt
15/Jan/11 18:31
13 kB
Peter Schuller
trigger.py
15/Jan/11 18:31
2 kB
Peter Schuller

Activity

People

Assignee:: Jonathan Ellis

Reporter:: Peter Schuller

Authors:: Jonathan Ellis

Reviewers:: Sylvain Lebresne

Votes:: 2 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Jan/11 18:29

Updated:: 16/Apr/19 09:33

Resolved:: 09/May/12 19:52