Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15027

Handle IR prepare phase failures less race prone by waiting for all results


    • Severity:


      Handling incremental repairs as a coordinator begins by sending a PrepareConsistentRequest message to all participants, which may also include the coordinator itself. Participants will run anti-compactions upon receiving such a message and report the result of the operation back to the coordinator.

      Once we receive a failure response from any of the participants, we fail-fast in CoordinatorSession.handlePrepareResponse(), which will in turn completes the prepareFuture that RepairRunnable is blocking on. Then the repair command will terminate with an error status, as expected.

      The issue is that in case the node will both be coordinator and participant, we may end up with a local session and submitted anti-compactions, which will be executed without any coordination with the coordinator session (on same node). This may result in situations where running repair commands right after another, may cause overlapping execution of anti-compactions that will cause the following (misleading) message to show up in the logs and will cause the repair to fail again:
      "Prepare phase for incremental repair session %s has failed because it encountered intersecting sstables belonging to another incremental repair session (%s). This is by starting an incremental repair session before a previous one has completed. Check nodetool repair_admin for hung sessions and fix them."




            • Assignee:
              spodxx@gmail.com Stefan Podkowinski
              spodxx@gmail.com Stefan Podkowinski
              Stefan Podkowinski
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: