Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-17012

Broken Pipe exception while replacing a failed node

    XMLWordPrintableJSON

Details

    • All
    • None

    Description

      We are encountering the following error:

      ERROR [STREAM-OUT-/NewNode] 2021-09-26 14:44:06,554 StreamSession.java:470 - [Stream #23a2c560-1ed5-11ec-8351-2f2e5cc09cec] Streaming error occurred
      java.io.IOException: Broken pipe
      	at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.7.0_67]
      	at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:433) ~[na:1.7.0_67]
      	at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565) ~[na:1.7.0_67]
      	at org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:74) ~[apache-cassandra-2.1.1.jar:2.1.1]
      	at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:56) ~[apache-cassandra-2.1.1.jar:2.1.1]
      	at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40) ~[apache-cassandra-2.1.1.jar:2.1.1]
      	at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) ~[apache-cassandra-2.1.1.jar:2.1.1]
      	at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:346) [apache-cassandra-2.1.1.jar:2.1.1]
      	at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:318) [apache-cassandra-2.1.1.jar:2.1.1]
      	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
      INFO  [STREAM-OUT-/NewNode] 2021-09-26 14:44:06,559 StreamResultFuture.java:180 - [Stream #23a2c560-1ed5-11ec-8351-2f2e5cc09cec] Session with /NewNode is complete
      WARN  [STREAM-OUT-/NewNode] 2021-09-26 14:44:06,560 StreamResultFuture.java:207 - [Stream #23a2c560-1ed5-11ec-8351-2f2e5cc09cec] Stream failed
      

      approximately 15 minutes into bootstrapping a replacement for a failed node into our 10 node ring. This appears to be preventing the new node from successfully joining the ring. When one of the nodes it is streaming data from encounters the aforementioned broken pipe exception, there are no corresponding errors logged by the new node. We're wondering if this might be related to, or a duplicate of CASSANDRA-10961 however we are not seeing the "Not enough bytes" error on the new node.

      Context:

      • All nodes in the cluster are running 2.1.1 currently
      • The cluster is currently down a node, leaving patch upgrade options to verify a fix by the linked (and possibly related) issue unclear, as this would require a simultaneous bootstrap and upgrade on the new node
      • We've restarted this process numerous times with the same result
      • The replication factor is set to 3
      • Reads and writes both require quorum
      • Each node has about 1.5TB of data

      Attachments

        Activity

          People

            Unassigned Unassigned
            roatin Roatin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: