Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2433

Failed Streams Break Repair

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.8.5
    • None
    • Normal

    Description

      Running repair in cases where a stream fails we are seeing multiple problems.

      1. Although retry is initiated and completes, the old stream doesn't seem to clean itself up and repair hangs.
      2. The temp files are left behind and multiple failures can end up filling up the data partition.

      These issues together are making repair very difficult for nearly everyone running repair on a non-trivial sized data set.

      This issue is also being worked on w.r.t CASSANDRA-2088, however that was moved to 0.8 for a few reasons. This ticket is to fix the immediate issues that we are seeing in 0.7.

      Attachments

        1. 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v4.patch
          16 kB
          Sylvain Lebresne
        2. 0002-Register-in-gossip-to-handle-node-failures-v4.patch
          10 kB
          Sylvain Lebresne
        3. 0003-Report-streaming-errors-back-to-repair-v4.patch
          31 kB
          Sylvain Lebresne
        4. 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch
          9 kB
          Sylvain Lebresne
        5. 2433.patch
          23 kB
          Sylvain Lebresne
        6. 2433_v2.patch
          29 kB
          Sylvain Lebresne
        7. 2433_v3.patch
          28 kB
          Sylvain Lebresne
        8. 2433_v4.patch
          28 kB
          Sylvain Lebresne

        Issue Links

          Activity

            People

              slebresne Sylvain Lebresne
              bcoverston Benjamin Coverston
              Sylvain Lebresne
              Jonathan Ellis
              Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: