Cassandra
  1. Cassandra
  2. CASSANDRA-4051

Stream sessions can only fail via the FailureDetector

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.0
    • Component/s: Core
    • Labels:

      Description

      If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

      1. 4051.txt
        32 kB
        Brandon Williams
      2. 4051-v2.txt
        32 kB
        Yuki Morishita
      3. 4051-v3.txt
        6 kB
        Yuki Morishita

        Issue Links

          Activity

          Hide
          Brandon Williams added a comment -

          Committed.

          Show
          Brandon Williams added a comment - Committed.
          Hide
          Yuki Morishita added a comment -

          v3 attached for 1.1 branch.

          It basically catches IOException on both sides and lets sessions closed.
          I also implemented IStreamCallback#onFailure to make sure latches count down to avoid process hang.

          Show
          Yuki Morishita added a comment - v3 attached for 1.1 branch. It basically catches IOException on both sides and lets sessions closed. I also implemented IStreamCallback#onFailure to make sure latches count down to avoid process hang.
          Hide
          Brandon Williams added a comment -

          Reopening because this only fixes the problem in one way, FileStreamTask can still fail all 8 times and never close the session. In general, outbound streaming's "fire and forget" methodology is problematic for bulk loading.

          Show
          Brandon Williams added a comment - Reopening because this only fixes the problem in one way, FileStreamTask can still fail all 8 times and never close the session. In general, outbound streaming's "fire and forget" methodology is problematic for bulk loading.
          Hide
          Brandon Williams added a comment -

          BOF looks good, +1, committed.

          Show
          Brandon Williams added a comment - BOF looks good, +1, committed.
          Hide
          Yuki Morishita added a comment -

          Patch attached based on CASSANDRA-3817 with retry limit.
          (I think it is nice to have retry limit per stream session, so that we can configure, say, no retry for bulk loading, which I think is enough. But that's beyond this issue.)

          > Brandon

          Can you test and see if BOF is OK with this patch?

          Show
          Yuki Morishita added a comment - Patch attached based on CASSANDRA-3817 with retry limit. (I think it is nice to have retry limit per stream session, so that we can configure, say, no retry for bulk loading, which I think is enough. But that's beyond this issue.) > Brandon Can you test and see if BOF is OK with this patch?
          Hide
          Yuki Morishita added a comment -

          Since CASSANDRA-3216 added IEndpointStateChangeSubscriber and IFailureDetectionEventListner to StreamOutSession, we need to keep that functionality. I proposed modified version of CASSANDRA-3112 except limiting retry part on CASSANDRA-3817, I would like to rebase that patch and add retry, so that I can post it here. (I will post it soon.)

          Show
          Yuki Morishita added a comment - Since CASSANDRA-3216 added IEndpointStateChangeSubscriber and IFailureDetectionEventListner to StreamOutSession, we need to keep that functionality. I proposed modified version of CASSANDRA-3112 except limiting retry part on CASSANDRA-3817 , I would like to rebase that patch and add retry, so that I can post it here. (I will post it soon.)
          Hide
          Brandon Williams added a comment -

          Updated patch extracted as mentioned, doesn't change any streaming behavior but does provide a way to detect errors that CASSANDRA-3112 and CASSANDRA-4045 can build on.

          Show
          Brandon Williams added a comment - Updated patch extracted as mentioned, doesn't change any streaming behavior but does provide a way to detect errors that CASSANDRA-3112 and CASSANDRA-4045 can build on.
          Hide
          Brandon Williams added a comment -

          It looks like we could extract/rebase the streaming changes from CASSANDRA-3112's first patch to solve this well enough for the bulk loader and BOF.

          Show
          Brandon Williams added a comment - It looks like we could extract/rebase the streaming changes from CASSANDRA-3112 's first patch to solve this well enough for the bulk loader and BOF.

            People

            • Assignee:
              Yuki Morishita
              Reporter:
              Brandon Williams
              Reviewer:
              Brandon Williams
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development