[CASSANDRA-11209] SSTable ancestor leaked reference - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Cannot Reproduce
Fix Version/s: None
Component/s: Local/Compaction
Labels:
None

Severity:
Normal
Since Version:

2.1.13

Description

We're running a fork of 2.1.13 that adds the TimeWindowCompactionStrategy from jjirsa. We've been running 4 clusters without any issues for many months until a few weeks ago we started scheduling incremental repairs every 24 hours (previously we didn't run any repairs at all).

Since then we started noticing big discrepancies in the LiveDiskSpaceUsed, TotalDiskSpaceUsed, and actual size of files on disk. The numbers are brought back in sync by restarting the node. We also noticed that when this bug happens there are several ancestors that don't get cleaned up. A restart will queue up a lot of compactions that slowly eat away the ancestors.

I looked at the code and noticed that we only decrease the LiveTotalDiskUsed metric in the SSTableDeletingTask. Since we have no errors being logged, I'm assuming that for some reason this task is not getting queued up. If I understand correctly this only happens when the reference count for the SStable reaches 0. So this is leading us to believe that something during repairs and/or compactions is causing a reference leak to the ancestor table.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

screenshot-2.png
23/Feb/16 15:44
41 kB
Jose Fernandez
screenshot-1.png
22/Feb/16 21:48
59 kB
Jose Fernandez

Issue Links

duplicates

CASSANDRA-11215 Reference leak with parallel repairs on the same table

Resolved

Activity

People

Assignee:: Marcus Eriksson

Reporter:: Jose Fernandez

Authors:: Marcus Eriksson

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 22/Feb/16 20:50

Updated:: 16/Apr/19 09:30

Resolved:: 08/Jul/16 14:35