[CASSANDRA-16638] compactions/repairs hangs (backport CASSANDRA-16552) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.11.11
Component/s: Consistency/Repair, Local/Compaction
Labels:
None

Bug Category:
Degradation
Severity:
Normal
Complexity:
Normal
Discovered By:
User Report
Platform:

All
Impacts:

None

Description

We meet an issue during repairs (but more probably compaction issue in fact) since we upgraded from 3.11.1 to 3.11.10.

We are using reaper, but the issue doesn't seem to come from it (according to adejanovski@hotmail.com ). When the problem happens, repairs driven by reaper are blocked.

Basically reaper hangs with the message "All nodes are busy or have too many pending compactions for the remaining candidate segments." and indeed one node has a lot of compaction pending tasks :

$ nodetool compactionstats
pending tasks: 95
- mt_metrics.metric_32: 95

Errors in log are :

WARN [CompactionExecutor:12909] 2021-04-28 08:59:51,241 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12909] 2021-04-28 09:00:19,484 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12908] 2021-04-28 09:00:51,241 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12907] 2021-04-28 08:58:51,097 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350757-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350755-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350738-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350759-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350761-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350740-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350751-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/
....

The error happened several times in few weeks and up to now always concerns LCS tables.

a.dejanoski mentioned me https://issues.apache.org/jira/browse/CASSANDRA-15242 but I have no trace of messages like "disk boundaries are out of date for keyspacename.tablename" or "Refreshing disk boundary cache for keyspacename.tablename".

The workaround is simple : just restart the node once it is identified. Pending compactions tasks rerun well.

We have the issue on 2 of our clusters on 3.11.10.
Does someone else met the issue ?

Attachments

Issue Links

is fixed by

CASSANDRA-16552 Anticompaction appears to race with Compaction, preventing forward compaction progress after an incremental repair

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: regis le bretonnic

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 28/Apr/21 15:51

Updated:: 30/Apr/21 05:37

Resolved:: 30/Apr/21 05:37