Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13362

Cassandra 2.1.15 main thread stuck in logback stack trace upon joining existing cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Invalid
    • None
    • None
    • None
    • Normal

    Description

      Switching from Cassandra 2.0.17 to Cassandra 2.1.15 (DSC edition: dsc-cassandra-2.1.15-bin.tar.gz) in a local VM based Linux environment for installer verification tests.

      [root@localhost jdk1.8.0_102]# lsb_release -d
      Description:	CentOS release 6.7 (Final)
      You have new mail in /var/spool/mail/root
      [root@localhost jdk1.8.0_102]# uname -a
      Linux localhost 2.6.32-573.el6.x86_64 #1 SMP Thu Jul 23 15:44:03 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
      

      The test environment is started from scratch, thus in the following scenario not an upgrade from 2.0 to 2.1, but a fresh 2.1 installation.

      The first node started up fine, but when extending the cluster with a second node, the second node hangs in the following Cassandra log output while starting up, joining the existing node:

      INFO  [InternalResponseStage:1] 2017-03-21 21:10:43,864 DefsTables.java:373 - Loading org.apache.cassandra.config.CFMetaData@1c3daf27[cfId=a8cb1eb0-0e61-11e7-9a56-b999920ca863,ksName=ruxitdb,cfName=EventQueue,cf$
      INFO  [main] 2017-03-21 21:11:11,404 StorageService.java:1138 - JOINING: schema complete, ready to bootstrap
      ...
      INFO  [main] 2017-03-22 03:13:36,148 StorageService.java:1138 - JOINING: waiting for pending range calculation
      INFO  [main] 2017-03-22 03:13:36,149 StorageService.java:1138 - JOINING: calculation complete, ready to bootstrap
      INFO  [main] 2017-03-22 03:13:36,156 StorageService.java:1138 - JOINING: getting bootstrap token
      ...
      

      So, basically it was stuck on 2017-03-21 21:11:11,404 and the main thread somehow continued on 2017-03-22 03:13:36,148, ~ 6 hours later.

      I have two thread dumps. The first from 21:30:
      td___2017-03-21-21-30-09.tdump

      and a second one ~ 100min later:
      td___2017-03-21-23-09-59.tdump

      Both thread dumps have in common, that the main thread is stuck in some logback code:

      "main" #1 prio=5 os_prio=0 tid=0x00007fe93821a800 nid=0x4d4e waiting on condition [0x00007fe93c813000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000000c861bb88> (a java.util.concurrent.locks.ReentrantLock$FairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
      	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      	at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:217)
      	at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
      	at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
      	at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
      	at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:273)
      	at ch.qos.logback.classic.Logger.callAppenders(Logger.java:260)
      	at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:442)
      	at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:396)
      	at ch.qos.logback.classic.Logger.info(Logger.java:600)
      	at org.apache.cassandra.service.StorageService.setMode(StorageService.java:1138)
      	at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:870)
      	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
      	- locked <0x00000000c85d37d8> (a org.apache.cassandra.service.StorageService)
      	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
      	- locked <0x00000000c85d37d8> (a org.apache.cassandra.service.StorageService)
      	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391)
      	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
      	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655)
      

      Attachments

        1. td___2017-03-21-21-30-09.tdump
          37 kB
          Thomas Steinmaurer
        2. td___2017-03-21-23-09-59.tdump
          40 kB
          Thomas Steinmaurer

        Activity

          People

            Unassigned Unassigned
            tsteinmaurer Thomas Steinmaurer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: