Details
-
Bug
-
Status: Resolved
-
Urgent
-
Resolution: Fixed
-
None
-
Critical
Description
During commit log replay, if there are materialized views, it's possible for contention on the MV lock to cause a WriteTimeoutException. This makes commit log replay fail, which of course prevents the node from starting up. This generally means that the operator has to move the commitlog segments to avoid replay.
Here's a stacktrace of this happening on 3.0.5:
ERROR [main] 2016-05-25 15:10:31,120 CassandraDaemon.java:692 - Exception encountered during startup java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses. at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:50) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:372) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:624) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:511) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:406) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:153) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:551) [apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:679) [apache-cassandra-3.0.5.jar:3.0.5] Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses. at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.get(AbstractLocalAwareExecutorService.java:200) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:365) ~[apache-cassandra-3.0.5.jar:3.0.5] ... 9 common frames omitted Suppressed: java.util.concurrent.ExecutionException: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses. ... 11 common frames omitted Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses. at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:431) at org.apache.cassandra.db.Keyspace.lambda$apply$62(Keyspace.java:443) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at java.lang.Thread.run(Thread.java:745)
We should ignore the write_rpc_timeout setting while acquiring MV locks if we're on the commitlog replay path.