Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3316

Add a JMX call to force cleaning repair sessions (in case they are hang up)

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Low
    • Resolution: Fixed
    • Fix Version/s: 1.0.2
    • Component/s: None
    • Labels:
      None

      Description

      A repair session contains many parts, most of which are not local to the node (implying the node waits on those operation). You request merkle trees, then you schedule streaming (and in 1.0.0, some of the streaming don't involve the local node itself). It's lots of place where something can go wrong, and if so it leaves the repair hanging and as a consequence it leaves a repairSessions tasks sitting active on the 'AntiEntropy Session' executor.

      Obviously, we should improve the detection by repair of those things that can go wrong. CASSANDRA-2433 started and CASSANDRA-3112 is open to fill as much of the remaining parts as possible, but my bet is that it will be hard to cover everything (and it may not be worth of handling very improbable failure scenario). Besides CASSANDRA-3112 will involve change in the wire protocol, so it may take some time to be committed. In the meantime, it would be nice to provide a JMX call to force terminating repairSessions so that you don't end up in the case where you have enough 'zombie' sessions on the executor that you can't submit new ones (you could restart the node but it's ugly). Anyway, it's not a big issue but it would be simple to add such a JMX call.

        Attachments

        1. 3316-v1.txt
          5 kB
          Yuki Morishita

          Activity

            People

            • Assignee:
              yukim Yuki Morishita
              Reporter:
              slebresne Sylvain Lebresne
              Authors:
              Yuki Morishita
              Reviewers:
              Sylvain Lebresne
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: