Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Normal
Description
I believe this is related/similar to CASSANDRA-11373, but I'm running 3.5 and I still have this issue.
AFAICT, this happens when getCompactionCandidates in LeveledManifest.java returns a candidate that does not exist on disk.
Eventually, all the compaction threads back up, garbage collections start taking an upwards of 20 seconds and messages start being dropped.
To get around this, I patched my instance with the following code in LeveledManifest.java
Set<SSTableReader> removeCandidates = new HashSet<>(); for (SSTableReader sstable : candidates) { if (!(new java.io.File(sstable.getFilename())).exists()) { removeCandidates.add(sstable); logger.warn("Not compating candidate {} because it does not exist ({}).", sstable.getFilename(), sstable.openReason); } } candidates.removeAll(removeCandidates); if (candidates.size() < 2) return Collections.emptyList(); else return candidates;
This just removes any candidate that doesn't exist on disk - however I'm not sure what the side effects of this are.