I believe this is related/similar to
CASSANDRA-11373, but I'm running 3.5 and I still have this issue.
AFAICT, this happens when getCompactionCandidates in LeveledManifest.java returns a candidate that does not exist on disk.
Eventually, all the compaction threads back up, garbage collections start taking an upwards of 20 seconds and messages start being dropped.
To get around this, I patched my instance with the following code in LeveledManifest.java
This just removes any candidate that doesn't exist on disk - however I'm not sure what the side effects of this are.