Details
-
Bug
-
Status: Resolved
-
Urgent
-
Resolution: Fixed
-
None
-
None
-
Critical
Description
This is just bootstrapping a ~ 180 trunk cluster. After a while, a
node I was on was stuck with thinking all nodes are down, because
gossip stage was backed up, because it was spending a long time
(multiple seconds or more, I suppose RPC timeout maybe) doing the
following. Cluster-wide restart -> back to normal. I have not
investigated further.
"GossipStage:1" daemon prio=10 tid=0x00007f9d5847a800 nid=0xa6fc waiting on condition [0x000000004345f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000005029ad1c0> (a java.util.concurrent.FutureTask$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:364) at org.apache.cassandra.service.MigrationManager.rectifySchema(MigrationManager.java:132) at org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:75) at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:802) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:918) at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
Attachments
Attachments
Issue Links
- is related to
-
CASSANDRA-3882 avoid distributed deadlock in migration stage
- Resolved