Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15666

Race condition when completing stream sessions

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Correctness - Transient Incorrect Response
    • Normal
    • Normal
    • Code Inspection
    • All
    • None
    • Hide

      Added interceptor to verify stream messages and state transition.

       CI: https://circleci.com/workflow-run/4d524604-99f8-4c74-b24a-b31e63bce063

        dtest failure "repair_test.py" is fixed in https://github.com/apache/cassandra-dtest/pull/63
       
       

      Show
      Added interceptor to verify stream messages and state transition.  CI: https://circleci.com/workflow-run/4d524604-99f8-4c74-b24a-b31e63bce063   dtest failure "repair_test.py" is fixed in  https://github.com/apache/cassandra-dtest/pull/63    

    Description

      StreamSession#prepareAsync() executes, as the name implies, asynchronously from the IO thread: this opens up for race conditions between the sending of the PrepareSynAckMessage and the call to StreamSession#maybeCompleted(). I.e., the following could happen:
      1) Node A sends PrepareSynAckMessage from the prepareAsync() thread.
      2) Node B receives it and starts streaming.
      3) Node A receives the streamed file and sends ReceivedMessage.
      4) At this point, if this was the only file to stream, both nodes are ready to close the session via maybeCompleted(), but:
      a) Node A will call it twice from both the IO thread and the thread at #1, closing the session and its channels.
      b) Node B will attempt to send a CompleteMessage, but will fail because the session has been closed in the meantime.

      There are other subtle variations of the pattern above, depending on the order of concurrently sent/received messages.

      I believe the best fix would be to modify the message exchange so that:
      1) Only the "follower" is allowed to send the CompleteMessage.
      2) Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage.

      By doing so, the message exchange logic would be easier to reason about, which is overall a win anyway.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jasonstack Zhao Yang Assign to me
            sbtourist Sergio Bossa
            Zhao Yang
            Benjamin Lerer, Sergio Bossa
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment