Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14503

Internode connection management is race-prone

    XMLWordPrintableJSON

    Details

    • Severity:
      Normal

      Description

      Following CASSANDRA-8457, internode connection management has been rewritten to rely on Netty, but the new implementation in OutboundMessagingConnection seems quite race prone to me, in particular on those two cases:

      • #finishHandshake() racing with #close(): i.e. in such case the former could run into an NPE if the latter nulls the channelWriter (but this is just an example, other conflicts might happen).
      • Connection timeout and retry racing with state changing methods: connectionRetryFuture and connectionTimeoutFuture are cancelled when handshaking or closing, but there's no guarantee those will be actually cancelled (as they might be already running), so they might end up changing the connection state concurrently with other methods (i.e. by unexpectedly closing the channel or clearing the backlog).

      Overall, the thread safety of OutboundMessagingConnection is very difficult to assess given the current implementation: I would suggest to refactor it into a single-thread model, where all connection state changing actions are enqueued on a single threaded scheduler, so that state transitions can be clearly defined and checked.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jasobrown Jason Brown
                Reporter:
                sbtourist Sergio Bossa
                Authors:
                Jason Brown
                Reviewers:
                Benedict Elliott Smith, Dinesh Joshi
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h