Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14525

streaming failure during bootstrap makes new node into inconsistent state

    XMLWordPrintableJSON

Details

    • Availability - Unavailable
    • Normal

    Description

      If bootstrap fails for newly joining node (most common reason is due to streaming failure) then Cassandra state remains in joining state which is fine but Cassandra also enables Native transport which makes overall state inconsistent. This further creates NullPointer exception if auth is enabled on the new node, please find reproducible steps here:

      For example if bootstrap fails due to streaming errors like

      java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
      at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]
      at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) [apache-cassandra-3.0.16.jar:3.0.16]
      Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
      at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-18.0.jar:na]
      at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]

      then variable StorageService.java::dataAvailable will be false. Since dataAvailable is false hence it will not call StorageService.java::finishJoiningRing and as a result StorageService.java::doAuthSetup will not be invoked.

      API StorageService.java::joinTokenRing returns without any problem. After this CassandraDaemon.java::start is invoked which starts native transport at
      CassandraDaemon.java::startNativeTransport

      At this point daemon’s bootstrap is still not finished and transport is enabled. So client will connect to the node and will encounter java.lang.NullPointerException as following:

      ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception during request; channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121]
      java.lang.NullPointerException: null
      at org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429) [apache-cassandra-3.0.16.jar:3.0.16]
      at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
      at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
      at io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121]
      at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.0.16.jar:3.0.16]
      at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]

      At this point if we run nodetool status then it will show this new node in UJ state, however clients can connect to this node over CQL and will receive java.lang.NullPointerException

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              chovatia.jaydeep@gmail.com Jaydeepkumar Chovatia
              chovatia.jaydeep@gmail.com Jaydeepkumar Chovatia
              Jaydeepkumar Chovatia
              Kurt Greaves
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: