Adding notes from discussion on #cassandra-dev:
There are legitimate production use cases that would benefit from being able to skip snapshots, both for truncate and dropColumnFamily.
For example, our application has some ColumnFamilies that contain important data, and others that contain large volumes of derived data. Relatively often, we have the need to discard the derived data in order to recompute it. Truncate without snapshot would be ideal for this.
While we could proactively prune snapshots, the coordination of such housekeeping out-of-process is complex. Differentiating "important" snapshots from unwanted snapshots when pruning requires that this process has knowledge of our domain model.
Brandon stressed that the current philosophy is that Cassandra should never truly delete data, as a protection against accidental data loss. My own opinion is that Cassandra and its API should not work against the caller's intent.
A compromise that we talked about is as follows:
1. Additional methods for truncate and dropColumnFamily that include a snapshot control flag, e.g:
truncate(String keySpace, String cf, boolean withSnapshot)
dropColumnFamily(String keySpace, String cf, boolean withSnapshot)
2. Add a "safety" configuration option that governs whether the snapshot flag is actually honored.
So if(!withSnapshot && !allowSkippedSnapshotOnDeletion), we could either fail hard for an illegal request or log a warning and snapshot anyway.
The advantage of this approach is that it would be equally useful for testing and production. The disadvantage is that it is an API change and so would have to come in a later release.
An interim solution might be to add the configuration option (e.g. allowSkippedSnapshotOnDeletion, default false) and have truncate
and delete honor that value directly. In a later release, the "withSnapshot" API additions could use allowSkippedSnapshotOnDeletion as described above in #2.