Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16638

compactions/repairs hangs (backport CASSANDRA-16552)

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Degradation
    • Normal
    • Normal
    • User Report
    • All
    • None

    Description

      Hi

      We meet an issue during repairs (but more probably compaction issue in fact) since we upgraded from 3.11.1 to 3.11.10.

      We are using reaper, but the issue doesn't seem to come from it (according to Alexander DEJANOVSKI ). When the problem happens, repairs driven by reaper are blocked.

      Basically reaper hangs with the message "All nodes are busy or have too many pending compactions for the remaining candidate segments." and indeed one node has a lot of compaction pending tasks :

       

      $ nodetool compactionstats
      pending tasks: 95
      - mt_metrics.metric_32: 95 
      

      Errors in log are :

       

      WARN [CompactionExecutor:12909] 2021-04-28 08:59:51,241 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12909] 2021-04-28 09:00:19,484 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12908] 2021-04-28 09:00:51,241 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
      ....
      WARN [CompactionExecutor:12907] 2021-04-28 08:58:51,097 LeveledCompactionStrategy.java:144 - Could not acquire references for compacting SSTables [BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350757-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350755-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350738-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350759-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350761-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350740-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350751-big-Data.db'), BigTableReader(path='/var/lib/cassandra/data/mt_metrics/
      .... 
      

      The error happened several times in few weeks and up to now always concerns LCS tables.

      a.dejanoski mentioned me https://issues.apache.org/jira/browse/CASSANDRA-15242 but I have no trace of messages like "disk boundaries are out of date for keyspacename.tablename" or "Refreshing disk boundary cache for keyspacename.tablename".

      The workaround is simple : just restart the node once it is identified. Pending compactions tasks rerun well.

      We have the issue on 2 of our clusters on 3.11.10.
      Does someone else met the issue ?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            easyoups regis le bretonnic
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment