Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3316

Add a JMX call to force cleaning repair sessions (in case they are hang up)

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Low
    • Resolution: Fixed
    • Fix Version/s: 1.0.2
    • Component/s: None
    • Labels:
      None

      Description

      A repair session contains many parts, most of which are not local to the node (implying the node waits on those operation). You request merkle trees, then you schedule streaming (and in 1.0.0, some of the streaming don't involve the local node itself). It's lots of place where something can go wrong, and if so it leaves the repair hanging and as a consequence it leaves a repairSessions tasks sitting active on the 'AntiEntropy Session' executor.

      Obviously, we should improve the detection by repair of those things that can go wrong. CASSANDRA-2433 started and CASSANDRA-3112 is open to fill as much of the remaining parts as possible, but my bet is that it will be hard to cover everything (and it may not be worth of handling very improbable failure scenario). Besides CASSANDRA-3112 will involve change in the wire protocol, so it may take some time to be committed. In the meantime, it would be nice to provide a JMX call to force terminating repairSessions so that you don't end up in the case where you have enough 'zombie' sessions on the executor that you can't submit new ones (you could restart the node but it's ugly). Anyway, it's not a big issue but it would be simple to add such a JMX call.

        Attachments

          Activity

            People

            • Assignee:
              yukim Yuki Morishita Assign to me
              Reporter:
              slebresne Sylvain Lebresne
              Authors:
              Yuki Morishita
              Reviewers:
              Sylvain Lebresne

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment