Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12830

Stream failed during nodetool REBUILD in a multi-dc AWS env C* version 3.5

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Not A Problem
    • None
    • None
    • None
    • CentOS release 6.7
      Linux 2.6.32-573.1.1.el6.x86_64

    • Normal

    Description

      We are running a multi-dc (i.e region) cluster in AWS. When one of the nodes in the "us-west" appeared to have corrupted SSTables, and after multiple attempts to sstablescrub failed, I decided to clean up the data and commitlog contents, restarted it and launched a rebuild task

      sudo nodetool rebuild us-east
      

      Note I tried to rebuild from a different DC/AWS Region.

      However, 2/3 way to finish the process failed and the error from the nodetool command stderr output is

      error: Error while rebuilding node: Stream failed
      -- StackTrace --
      java.lang.RuntimeException: Error while rebuilding node: Stream failed
      	at org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1172)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
      	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
      	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
      	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
      	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
      	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
      	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
      	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
      	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
      	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
      	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
      	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
      	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
      	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
      	at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:324)
      	at sun.rmi.transport.Transport$1.run(Transport.java:200)
      	at sun.rmi.transport.Transport$1.run(Transport.java:197)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
      	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      In /var/log/cassandra/system.log:

      INFO  [StreamReceiveTask:4] 2016-10-23 08:05:08,843 StreamResultFuture.java:185 - [Stream #5f22eed0-98bb-11e6-8bac-8d90ab5dafcf] Session with /54.82.131.4 is complete
      ERROR [STREAM-OUT-/54.82.131.4] 2016-10-23 08:05:08,844 StreamSession.java:519 - [Stream #5f22eed0-98bb-11e6-8bac-8d90ab5dafcf] Streaming error occurred
      java.net.SocketException: Broken pipe
              at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_102]
              at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_102]
              at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_102]
              at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) ~[na:1.8.0_102]
              at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) ~[na:1.8.0_102]
              at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876) ~[na:1.8.0_102]
              at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847) ~[na:1.8.0_102]
              at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) ~[na:1.8.0_102]
              at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_102]
              at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.8.0_102]
              at org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66) ~[apache-cassandra-3.5.0.jar:3.5.0]
              at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371) [apache-cassandra-3.5.0.jar:3.5.0]
              at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342) [apache-cassandra-3.5.0.jar:3.5.0]
              at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
      WARN  [StreamReceiveTask:4] 2016-10-23 08:05:08,871 StreamResultFuture.java:212 - [Stream #5f22eed0-98bb-11e6-8bac-8d90ab5dafcf] Stream failed
      ERROR [RMI TCP Connection(9)-127.0.0.1] 2016-10-23 08:05:08,872 StorageService.java:1171 - Error while rebuilding node
      org.apache.cassandra.streaming.StreamException: Stream failed
              at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:213) ~[apache-cassandra-3.5.0.jar:3.5.0]
              at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:189) ~[apache-cassandra-3.5.0.jar:3.5.0]
              at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:429) ~[apache-cassandra-3.5.0.jar:3.5.0]
              at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:706) ~[apache-cassandra-3.5.0.jar:3.5.0]
              at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:667) ~[apache-cassandra-3.5.0.jar:3.5.0]
              at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:211) ~[apache-cassandra-3.5.0.jar:3.5.0]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_102]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_102]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_102]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_102]
              at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bing1wu Bing Wu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: