Details
Description
Two of the three Cassandra nodes in one of our clusters just started behaving very strange about an hour ago. Within a minute of each other they started logging AssertionErrors (see stack traces here: https://gist.github.com/iconara/7917438) over and over again. The client lost connection with the nodes at roughly the same time. The nodes were still up, and even if no clients were connected to them they continued logging the same errors over and over.
The errors are in the native transport (specifically MessagingService.addCallback) which makes me suspect that it has something to do with a test that we started running this afternoon. I've just implemented support for frame compression in my CQL driver cql-rb. About two hours before this happened I deployed a version of the application which enabled Snappy compression on all frames larger than 64 bytes. It's not impossible that there is a bug somewhere in the driver or compression library that caused this – but at the same time, it feels like it shouldn't be possible to make C* a zombie with a bad frame.
Restarting seems to have got them back running again, but I suspect they will go down again sooner or later.