Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16638

compactions/repairs hangs (backport CASSANDRA-16552)

    XMLWordPrintableJSON

Details

    • Degradation
    • Normal
    • Normal
    • User Report
    • All
    • None

    Description

      Hi

      We meet an issue during repairs (but more probably compaction issue in fact) since we upgraded from 3.11.1 to 3.11.10.

      We are using reaper, but the issue doesn't seem to come from it (according to adejanovski@hotmail.com ). When the problem happens, repairs driven by reaper are blocked.

      Basically reaper hangs with the message "All nodes are busy or have too many pending compactions for the remaining candidate segments." and indeed one node has a lot of compaction pending tasks :

       

      $ nodetool compactionstats
      pending tasks: 95
      - mt_metrics.metric_32: 95 
      

      Errors in log are :

       

      WARN [CompactionExecutor:12909] 2021-04-28 08:59:51,241 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12909] 2021-04-28 09:00:19,484 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12908] 2021-04-28 09:00:51,241 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12907] 2021-04-28 08:58:51,097 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350757-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350755-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350738-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350759-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350761-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350740-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350751-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/
      .... 
      

      The error happened several times in few weeks and up to now always concerns LCS tables.

      a.dejanoski mentioned me https://issues.apache.org/jira/browse/CASSANDRA-15242 but I have no trace of messages like "disk boundaries are out of date for keyspacename.tablename" or "Refreshing disk boundary cache for keyspacename.tablename".

      The workaround is simple : just restart the node once it is identified. Pending compactions tasks rerun well.

      We have the issue on 2 of our clusters on 3.11.10.
      Does someone else met the issue ?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              easyoups regis le bretonnic
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: