Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15666

Race condition when completing stream sessions

    XMLWordPrintableJSON

    Details

    • Bug Category:
      Correctness - Transient Incorrect Response
    • Severity:
      Normal
    • Complexity:
      Normal
    • Discovered By:
      Code Inspection
    • Platform:
      All
    • Impacts:
      None
    • Since Version:
    • Source Control Link:
    • Test and Documentation Plan:
      Hide

      Added interceptor to verify stream messages and state transition.

       CI: https://circleci.com/workflow-run/4d524604-99f8-4c74-b24a-b31e63bce063

        dtest failure "repair_test.py" is fixed in https://github.com/apache/cassandra-dtest/pull/63
       
       

      Show
      Added interceptor to verify stream messages and state transition.  CI: https://circleci.com/workflow-run/4d524604-99f8-4c74-b24a-b31e63bce063   dtest failure "repair_test.py" is fixed in  https://github.com/apache/cassandra-dtest/pull/63    

      Description

      StreamSession#prepareAsync() executes, as the name implies, asynchronously from the IO thread: this opens up for race conditions between the sending of the PrepareSynAckMessage and the call to StreamSession#maybeCompleted(). I.e., the following could happen:
      1) Node A sends PrepareSynAckMessage from the prepareAsync() thread.
      2) Node B receives it and starts streaming.
      3) Node A receives the streamed file and sends ReceivedMessage.
      4) At this point, if this was the only file to stream, both nodes are ready to close the session via maybeCompleted(), but:
      a) Node A will call it twice from both the IO thread and the thread at #1, closing the session and its channels.
      b) Node B will attempt to send a CompleteMessage, but will fail because the session has been closed in the meantime.

      There are other subtle variations of the pattern above, depending on the order of concurrently sent/received messages.

      I believe the best fix would be to modify the message exchange so that:
      1) Only the "follower" is allowed to send the CompleteMessage.
      2) Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage.

      By doing so, the message exchange logic would be easier to reason about, which is overall a win anyway.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jasonstack ZhaoYang
                Reporter:
                sbtourist Sergio Bossa
                Authors:
                ZhaoYang
                Reviewers:
                Benjamin Lerer, Sergio Bossa
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m