[CASSANDRA-13756] StreamingHistogram is not thread safe - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.0.15, 3.11.1
Component/s: None
Labels:
None

Severity:
Normal

Description

When we test C*3 in shadow cluster, we notice after a period of time, several data node suddenly run into 100% cpu and stop process query anymore.

After investigation, we found that threads are stuck on the sum() in streaminghistogram class. Those are jmx threads that working on expose getTombStoneRatio metrics (since jmx is kicked off every 3 seconds, there is a chance that multiple jmx thread is access streaminghistogram at the same time).

After further investigation, we find that the optimization in ~~CASSANDRA-13038~~ led to a spool flush every time when we call sum(). Since TreeMap is not thread safe, threads will be stuck when multiple threads visit sum() at the same time.

There are two approaches to solve this issue.

The first one is to add a lock to the flush in sum() which will introduce some extra overhead to streaminghistogram.

The second one is to avoid streaminghistogram to be access by multiple threads. For our specific case, is to remove the metrics we added.

Attachments

Issue Links

is duplicated by

CASSANDRA-13752 Corrupted SSTables created in 3.11

Resolved

CASSANDRA-13718 ConcurrentModificationException in nodetool upgradesstables

Resolved

is related to

CASSANDRA-13752 Corrupted SSTables created in 3.11

Resolved

CASSANDRA-13718 ConcurrentModificationException in nodetool upgradesstables

Resolved

relates to

CASSANDRA-13038 33% of compaction time spent in StreamingHistogram.update()

Resolved

Activity

People

Assignee:: Jeff Jirsa

Reporter:: xiangzhou xia

Authors:: Jeff Jirsa

Reviewers:: Jason Brown

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 11/Aug/17 01:45

Updated:: 16/Apr/19 09:30

Resolved:: 05/Sep/17 16:58