Description
The "snapshotLatch.await();" can be waiting for ever and block all repair operations indefinitely if something happens that another node doesn't respond.
public void makeSnapshots(Collection<InetAddress> endpoints) { try { snapshotLatch = new CountDownLatch(endpoints.size()); IAsyncCallback callback = new IAsyncCallback() { public boolean isLatencyForSnitch() { return false; } public void response(MessageIn msg) { RepairJob.this.snapshotLatch.countDown(); } }; for (InetAddress endpoint : endpoints) MessagingService.instance().sendRR(new SnapshotCommand(tablename, cfname, sessionName, false).createMessage(), endpoint, callback); snapshotLatch.await(); snapshotLatch = null; } catch (InterruptedException e) { throw new RuntimeException(e); } }
Attachments
Attachments
Issue Links
- relates to
-
CASSANDRA-18748 Transient disk failure could incur snapshot repair block forever
- Open