Cassandra
  1. Cassandra
  2. CASSANDRA-3776

Streaming task hangs forever during repair after unexpected connection reset by peer

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Fix Version/s: None
    • Component/s: Core
    • Labels:
      None
    • Environment:

      Windows Server 2008 R2
      Sun Java 7u2 64bit

      Description

      During streaming (repair) a stream receiving node thrown an exceptions:

      ERROR [Streaming:1] 2012-01-24 10:17:03,828 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Streaming:1,1,main]
      java.lang.RuntimeException: java.net.SocketException: Connection reset by peer: socket write error
      at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
      at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.net.SocketException: Connection reset by peer: socket write error
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(Unknown Source)
      at java.net.SocketOutputStream.write(Unknown Source)
      at com.ning.compress.lzf.LZFChunk.writeCompressedHeader(LZFChunk.java:77)
      at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:132)
      at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
      at com.ning.compress.lzf.LZFOutputStream.write(LZFOutputStream.java:97)
      at org.apache.cassandra.streaming.FileStreamTask.write(FileStreamTask.java:181)
      at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:145)
      at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
      at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
      ... 3 more
      ERROR [Streaming:1] 2012-01-24 10:17:03,891 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Streaming:1,1,main]
      java.lang.RuntimeException: java.net.SocketException: Connection reset by peer: socket write error
      at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
      at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.net.SocketException: Connection reset by peer: socket write error
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(Unknown Source)
      at java.net.SocketOutputStream.write(Unknown Source)
      at com.ning.compress.lzf.LZFChunk.writeCompressedHeader(LZFChunk.java:77)
      at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:132)
      at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
      at com.ning.compress.lzf.LZFOutputStream.write(LZFOutputStream.java:97)
      at org.apache.cassandra.streaming.FileStreamTask.write(FileStreamTask.java:181)
      at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:145)
      at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
      at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
      ... 3 more

      After which streaming hanged forever.

      A few seconds later the sending node had an exception (may not be related):
      ERROR [Thread-17224] 2012-01-24 10:17:07,817 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-17224,5,main]
      java.lang.ArrayIndexOutOfBoundsException

      Other than that, nodes behave normally, communicating each other.

        Activity

        Viktor Jevdokimov created issue -
        Hide
        Jonathan Ellis added a comment -

        To be specific: none of the nodes involved went down? Were anything else unusual correlated with the reset? Can you reproduce this?

        Show
        Jonathan Ellis added a comment - To be specific: none of the nodes involved went down? Were anything else unusual correlated with the reset? Can you reproduce this?
        Jonathan Ellis made changes -
        Field Original Value New Value
        Assignee Yuki Morishita [ yukim ]
        Fix Version/s 1.0.8 [ 12319453 ]
        Priority Major [ 3 ] Minor [ 4 ]
        Component/s Core [ 12312978 ]
        Sylvain Lebresne made changes -
        Fix Version/s 1.0.9 [ 12319856 ]
        Fix Version/s 1.0.8 [ 12319453 ]
        Hide
        Yuki Morishita added a comment -

        I was not able to reproduce myself yet, but this should happen when FileStreamTask gets Exception.
        I would like to fix this with CASSANDRA-4051 which is marked as fix for v1.1.

        Show
        Yuki Morishita added a comment - I was not able to reproduce myself yet, but this should happen when FileStreamTask gets Exception. I would like to fix this with CASSANDRA-4051 which is marked as fix for v1.1.
        Hide
        Jonathan Ellis added a comment -

        WFM, marking duplicate.

        Show
        Jonathan Ellis added a comment - WFM, marking duplicate.
        Jonathan Ellis made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Yuki Morishita [ yukim ]
        Fix Version/s 1.0.9 [ 12319856 ]
        Resolution Duplicate [ 3 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12650314 ] patch-available, re-open possible [ 12748936 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12748936 ] reopen-resolved, no closed status, patch-avail, testing [ 12756739 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Viktor Jevdokimov
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development