Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16638

compactions/repairs hangs (backport CASSANDRA-16552)

    XMLWordPrintableJSON

    Details

    • Bug Category:
      Degradation
    • Severity:
      Normal
    • Complexity:
      Normal
    • Discovered By:
      User Report
    • Platform:
      All
    • Impacts:
      None

      Description

      Hi

      We meet an issue during repairs (but more probably compaction issue in fact) since we upgraded from 3.11.1 to 3.11.10.

      We are using reaper, but the issue doesn't seem to come from it (according to Alexander DEJANOVSKI ). When the problem happens, repairs driven by reaper are blocked.

      Basically reaper hangs with the message "All nodes are busy or have too many pending compactions for the remaining candidate segments." and indeed one node has a lot of compaction pending tasks :

       

      $ nodetool compactionstats
      pending tasks: 95
      - mt_metrics.metric_32: 95 
      

      Errors in log are :

       

      WARN [CompactionExecutor:12909] 2021-04-28 08:59:51,241 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12909] 2021-04-28 09:00:19,484 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12908] 2021-04-28 09:00:51,241 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12907] 2021-04-28 08:58:51,097 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350757-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350755-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350738-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350759-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350761-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350740-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350751-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/
      .... 
      

      The error happened several times in few weeks and up to now always concerns LCS tables.

      a.dejanoski mentioned me https://issues.apache.org/jira/browse/CASSANDRA-15242 but I have no trace of messages like "disk boundaries are out of date for keyspacename.tablename" or "Refreshing disk boundary cache for keyspacename.tablename".

      The workaround is simple : just restart the node once it is identified. Pending compactions tasks rerun well.

      We have the issue on 2 of our clusters on 3.11.10.
      Does someone else met the issue ?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                easyoups regis le bretonnic
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: