Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14525

streaming failure during bootstrap makes new node into inconsistent state

    XMLWordPrintableJSON

    Details

    • Severity:
      Normal

      Description

      If bootstrap fails for newly joining node (most common reason is due to streaming failure) then Cassandra state remains in joining state which is fine but Cassandra also enables Native transport which makes overall state inconsistent. This further creates NullPointer exception if auth is enabled on the new node, please find reproducible steps here:

      For example if bootstrap fails due to streaming errors like

      java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
      at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]
      at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) [apache-cassandra-3.0.16.jar:3.0.16]
      Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
      at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
      at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-18.0.jar:na]
      at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]

      then variable StorageService.java::dataAvailable will be false. Since dataAvailable is false hence it will not call StorageService.java::finishJoiningRing and as a result StorageService.java::doAuthSetup will not be invoked.

      API StorageService.java::joinTokenRing returns without any problem. After this CassandraDaemon.java::start is invoked which starts native transport at
      CassandraDaemon.java::startNativeTransport

      At this point daemon’s bootstrap is still not finished and transport is enabled. So client will connect to the node and will encounter java.lang.NullPointerException as following:

      ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception during request; channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121]
      java.lang.NullPointerException: null
      at org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78) ~[apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429) [apache-cassandra-3.0.16.jar:3.0.16]
      at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
      at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
      at io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121]
      at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) [apache-cassandra-3.0.16.jar:3.0.16]
      at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.0.16.jar:3.0.16]
      at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]

      At this point if we run nodetool status then it will show this new node in UJ state, however clients can connect to this node over CQL and will receive java.lang.NullPointerException

        Attachments

        There are no Sub-Tasks for this issue.

          Activity

            People

            • Assignee:
              chovatia.jaydeep@gmail.com Jaydeepkumar Chovatia
              Reporter:
              chovatia.jaydeep@gmail.com Jaydeepkumar Chovatia
              Authors:
              Jaydeepkumar Chovatia
              Reviewers:
              Kurt Greaves
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: