Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15666

Race condition when completing stream sessions

    XMLWordPrintableJSON

Details

    • Correctness - Transient Incorrect Response
    • Normal
    • Normal
    • Code Inspection
    • All
    • None
    • Hide

      Added interceptor to verify stream messages and state transition.

       CI: https://circleci.com/workflow-run/4d524604-99f8-4c74-b24a-b31e63bce063

        dtest failure "repair_test.py" is fixed in https://github.com/apache/cassandra-dtest/pull/63
       
       

      Show
      Added interceptor to verify stream messages and state transition.  CI: https://circleci.com/workflow-run/4d524604-99f8-4c74-b24a-b31e63bce063   dtest failure "repair_test.py" is fixed in  https://github.com/apache/cassandra-dtest/pull/63    

    Description

      StreamSession#prepareAsync() executes, as the name implies, asynchronously from the IO thread: this opens up for race conditions between the sending of the PrepareSynAckMessage and the call to StreamSession#maybeCompleted(). I.e., the following could happen:
      1) Node A sends PrepareSynAckMessage from the prepareAsync() thread.
      2) Node B receives it and starts streaming.
      3) Node A receives the streamed file and sends ReceivedMessage.
      4) At this point, if this was the only file to stream, both nodes are ready to close the session via maybeCompleted(), but:
      a) Node A will call it twice from both the IO thread and the thread at #1, closing the session and its channels.
      b) Node B will attempt to send a CompleteMessage, but will fail because the session has been closed in the meantime.

      There are other subtle variations of the pattern above, depending on the order of concurrently sent/received messages.

      I believe the best fix would be to modify the message exchange so that:
      1) Only the "follower" is allowed to send the CompleteMessage.
      2) Only the "initiator" is allowed to close the session and its channels after receiving the CompleteMessage.

      By doing so, the message exchange logic would be easier to reason about, which is overall a win anyway.

      Attachments

        Issue Links

          Activity

            People

              jasonstack Zhao Yang
              sbtourist Sergio Bossa
              Zhao Yang
              Benjamin Lerer, Sergio Bossa
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m