Cassandra
  1. Cassandra
  2. CASSANDRA-3710

Add a configuration option to disable snapshots

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.0.10, 1.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Let me first say, I hate this idea. It gives cassandra the ability to permanently delete data at a large scale without any means of recovery. However, I've seen this requested multiple times, and it is in fact useful in some scenarios, such as when your application is using an embedded cassandra instance for testing and need to truncate, which without JNA will timeout more often than not.

      1. auto_snapshot_2.diff
        4 kB
        Dave Brosius
      2. auto_snapshot_3.diff
        4 kB
        Dave Brosius
      3. auto_snapshot.diff
        3 kB
        Dave Brosius
      4. Cassandra107Patch_TestModeV1.txt
        13 kB
        Christian Spriegel

        Issue Links

          Activity

          Hide
          Jonathan Ellis added a comment -

          So... install JNA?

          Show
          Jonathan Ellis added a comment - So... install JNA?
          Hide
          Christian Spriegel added a comment -

          I am one guy who an issue with snapshots triggered by truncate (Brandon thinks that snapshotting is causing my problem).

          I guess there is no real reason why anyone would need that. Its just that for embedding into a testsuite its simply not needed to snapshot data and it makes the setup harder.

          For me snapshots are still not working, even though I have jna in my classpath. I guess there must be something wrong with maven-surefire. But I guess this is not the topic here.

          Show
          Christian Spriegel added a comment - I am one guy who an issue with snapshots triggered by truncate (Brandon thinks that snapshotting is causing my problem). I guess there is no real reason why anyone would need that. Its just that for embedding into a testsuite its simply not needed to snapshot data and it makes the setup harder. For me snapshots are still not working, even though I have jna in my classpath. I guess there must be something wrong with maven-surefire. But I guess this is not the topic here.
          Hide
          Jonathan Ellis added a comment -

          Another option would be to use Java7: CASSANDRA-3734

          Show
          Jonathan Ellis added a comment - Another option would be to use Java7: CASSANDRA-3734
          Hide
          Christian Spriegel added a comment -

          I think I came up with a real use case for this: It would be useful for Development or Quality Systems. I personally dont care if data gets lost on my Development machines. But I dont like deleting snapshots on all those machines.

          Afaik also schema updates create snapshots.

          Jonathan: Would it help if somebody would provide a patch for this, or are you against it?

          Show
          Christian Spriegel added a comment - I think I came up with a real use case for this: It would be useful for Development or Quality Systems. I personally dont care if data gets lost on my Development machines. But I dont like deleting snapshots on all those machines. Afaik also schema updates create snapshots. Jonathan: Would it help if somebody would provide a patch for this, or are you against it?
          Hide
          Christian Spriegel added a comment -

          Update on my testsuite: I was just able to get jna running in my junits, therefore truncate is finally working, which is good.

          Unfortunetaly the performance of my testsuite is not good any more. It now took 25 minutes instead of 3 minutes to run all my tests. The hard disk was busy all the time (top showed ~10% io wait).
          I checked for snapshots in the cassandra directory: There were about 2300 folders within the snapshots directory. Within these were ~30k files in all those directories. I assume that the performance decrease is due to the 30k symlinks being created.

          Show
          Christian Spriegel added a comment - Update on my testsuite: I was just able to get jna running in my junits, therefore truncate is finally working, which is good. Unfortunetaly the performance of my testsuite is not good any more. It now took 25 minutes instead of 3 minutes to run all my tests. The hard disk was busy all the time (top showed ~10% io wait). I checked for snapshots in the cassandra directory: There were about 2300 folders within the snapshots directory. Within these were ~30k files in all those directories. I assume that the performance decrease is due to the 30k symlinks being created.
          Hide
          Christian Spriegel added a comment -

          I looked into it a bit deeper and I must admit its much more complicated than I thought. The performance penalty is not just due to the hardlinks, but due to the memtable flushes caused by the truncate.

          What we do:
          We run tests from multiple threads to improve speed. Every thread in a separate keyspace. Between tests the current keyspace is being resetted.

          What happens within cassandra:
          A truncate triggered from one testthread now flushes the memtables from all other testthreads too, which causes the testthreads to fsync each other.

          How I tried to resolve this:
          I removed the flush+hardlink from truncate, which made it run much faster. Unfortunetaly the commitlog still has all the data in it and I do not see a way to drop the commitlog data withoug flushing all CFs.

          Well, I guess I will make a patch for our internal testsuite. Is anybody out there who has the same problem, but maybe a better approach?

          kind regards,
          Christian

          Show
          Christian Spriegel added a comment - I looked into it a bit deeper and I must admit its much more complicated than I thought. The performance penalty is not just due to the hardlinks, but due to the memtable flushes caused by the truncate. What we do: We run tests from multiple threads to improve speed. Every thread in a separate keyspace. Between tests the current keyspace is being resetted. What happens within cassandra: A truncate triggered from one testthread now flushes the memtables from all other testthreads too, which causes the testthreads to fsync each other. How I tried to resolve this: I removed the flush+hardlink from truncate, which made it run much faster. Unfortunetaly the commitlog still has all the data in it and I do not see a way to drop the commitlog data withoug flushing all CFs. Well, I guess I will make a patch for our internal testsuite. Is anybody out there who has the same problem, but maybe a better approach? kind regards, Christian
          Hide
          Peter Sanford added a comment -

          We hit the same issue with our integration tests. Originally we would truncate between each test. As our test suite continued to grow, the time penalty became too large for us to continue doing so.

          We experimented with deleting all rows from all column families instead of truncating (inspired by [this blog post]http://dev.aboutus.org/2011/08/22/cassandra-truncate-means-slow-test-suites.html). That was faster than truncating but still not as fast as we would like (performance degraded linearly over the course of a test run).

          We eventually tweaked our tests in such a way that each test was guaranteed to use a new set of row keys from any previous test; effectively allowing us to ignore the problem. This was fairly simple for us because all of are row keys are prefixed with a UUID. Obviously this solution is dependent on the data model you are using and will not work for everyone.

          There are use cases where a patch like this would be helpful. Obviously that needs to be balanced against the cost of maintaining code that is not used in production rings.

          Show
          Peter Sanford added a comment - We hit the same issue with our integration tests. Originally we would truncate between each test. As our test suite continued to grow, the time penalty became too large for us to continue doing so. We experimented with deleting all rows from all column families instead of truncating (inspired by [this blog post] http://dev.aboutus.org/2011/08/22/cassandra-truncate-means-slow-test-suites.html ). That was faster than truncating but still not as fast as we would like (performance degraded linearly over the course of a test run). We eventually tweaked our tests in such a way that each test was guaranteed to use a new set of row keys from any previous test; effectively allowing us to ignore the problem. This was fairly simple for us because all of are row keys are prefixed with a UUID. Obviously this solution is dependent on the data model you are using and will not work for everyone. There are use cases where a patch like this would be helpful. Obviously that needs to be balanced against the cost of maintaining code that is not used in production rings.
          Hide
          Christian Spriegel added a comment -

          added 'testmode' patch

          Show
          Christian Spriegel added a comment - added 'testmode' patch
          Hide
          Christian Spriegel added a comment -

          Hi Peter,
          it is good to know that we are not the only ones having this problem.

          The approaches you described are not really suitable for our application. Empty or old rows would distract the application. Thats why I created a pretty radical patch for cassandra:

          It adds a new config setting called 'test_mode_enabled'. If set to true, it will disable the commitlog, disable snapshots and disables memtable flushes for truncates.

          I uploaded it, maybe this is useful for your tests too.

          Christian

          Show
          Christian Spriegel added a comment - Hi Peter, it is good to know that we are not the only ones having this problem. The approaches you described are not really suitable for our application. Empty or old rows would distract the application. Thats why I created a pretty radical patch for cassandra: It adds a new config setting called 'test_mode_enabled'. If set to true, it will disable the commitlog, disable snapshots and disables memtable flushes for truncates. I uploaded it, maybe this is useful for your tests too. Christian
          Hide
          Jonathan Ellis added a comment -

          It sounds like you really want a way to "mock" Cassandra. Have you checked out https://github.com/riptano/Cassanova ? It probably needs some updating but it might a good fit.

          Show
          Jonathan Ellis added a comment - It sounds like you really want a way to "mock" Cassandra. Have you checked out https://github.com/riptano/Cassanova ? It probably needs some updating but it might a good fit.
          Hide
          Christian Spriegel added a comment -

          Yes, some dummy cassandra that is easily resettable. Cassanova looks interesting, thanks for sharing the link. Unfortunetaly it misses support for secondary indexes and TTL. Currently, for me the best solution seems to patch cassandra and deploy it our maven repo. All developers will get the patched version via maven, which will then start itself from within the testsuite. I'm a java guy, I rather patch cassandra than integrating python

          btw: The patched cassandra works actually quite fast:
          unpatched, 2 testthreads: 632 sec
          unpatched, 4 testthreads: 1470 sec
          patched, 2 testthreads: 350 sec
          patched, 4 testthreads: 243 sec
          (We use testng which allows tests to be run in multiple threads)

          Christian

          Show
          Christian Spriegel added a comment - Yes, some dummy cassandra that is easily resettable. Cassanova looks interesting, thanks for sharing the link. Unfortunetaly it misses support for secondary indexes and TTL. Currently, for me the best solution seems to patch cassandra and deploy it our maven repo. All developers will get the patched version via maven, which will then start itself from within the testsuite. I'm a java guy, I rather patch cassandra than integrating python btw: The patched cassandra works actually quite fast: unpatched, 2 testthreads: 632 sec unpatched, 4 testthreads: 1470 sec patched, 2 testthreads: 350 sec patched, 4 testthreads: 243 sec (We use testng which allows tests to be run in multiple threads) Christian
          Hide
          Chris Herron added a comment -

          Adding notes from discussion on #cassandra-dev:

          There are legitimate production use cases that would benefit from being able to skip snapshots, both for truncate and dropColumnFamily.

          For example, our application has some ColumnFamilies that contain important data, and others that contain large volumes of derived data. Relatively often, we have the need to discard the derived data in order to recompute it. Truncate without snapshot would be ideal for this.

          While we could proactively prune snapshots, the coordination of such housekeeping out-of-process is complex. Differentiating "important" snapshots from unwanted snapshots when pruning requires that this process has knowledge of our domain model.

          Brandon stressed that the current philosophy is that Cassandra should never truly delete data, as a protection against accidental data loss. My own opinion is that Cassandra and its API should not work against the caller's intent.

          A compromise that we talked about is as follows:

          1. Additional methods for truncate and dropColumnFamily that include a snapshot control flag, e.g:

          truncate(String keySpace, String cf, boolean withSnapshot)
          dropColumnFamily(String keySpace, String cf, boolean withSnapshot)

          2. Add a "safety" configuration option that governs whether the snapshot flag is actually honored.
          So if(!withSnapshot && !allowSkippedSnapshotOnDeletion), we could either fail hard for an illegal request or log a warning and snapshot anyway.

          The advantage of this approach is that it would be equally useful for testing and production. The disadvantage is that it is an API change and so would have to come in a later release.

          An interim solution might be to add the configuration option (e.g. allowSkippedSnapshotOnDeletion, default false) and have truncate
          and delete honor that value directly. In a later release, the "withSnapshot" API additions could use allowSkippedSnapshotOnDeletion as described above in #2.

          Show
          Chris Herron added a comment - Adding notes from discussion on #cassandra-dev: There are legitimate production use cases that would benefit from being able to skip snapshots, both for truncate and dropColumnFamily. For example, our application has some ColumnFamilies that contain important data, and others that contain large volumes of derived data. Relatively often, we have the need to discard the derived data in order to recompute it. Truncate without snapshot would be ideal for this. While we could proactively prune snapshots, the coordination of such housekeeping out-of-process is complex. Differentiating "important" snapshots from unwanted snapshots when pruning requires that this process has knowledge of our domain model. Brandon stressed that the current philosophy is that Cassandra should never truly delete data, as a protection against accidental data loss. My own opinion is that Cassandra and its API should not work against the caller's intent. A compromise that we talked about is as follows: 1. Additional methods for truncate and dropColumnFamily that include a snapshot control flag, e.g: truncate(String keySpace, String cf, boolean withSnapshot) dropColumnFamily(String keySpace, String cf, boolean withSnapshot) 2. Add a "safety" configuration option that governs whether the snapshot flag is actually honored. So if(!withSnapshot && !allowSkippedSnapshotOnDeletion), we could either fail hard for an illegal request or log a warning and snapshot anyway. The advantage of this approach is that it would be equally useful for testing and production. The disadvantage is that it is an API change and so would have to come in a later release. An interim solution might be to add the configuration option (e.g. allowSkippedSnapshotOnDeletion, default false) and have truncate and delete honor that value directly. In a later release, the "withSnapshot" API additions could use allowSkippedSnapshotOnDeletion as described above in #2.
          Hide
          Jonathan Ellis added a comment -

          I'm fine with adding an autosnapshot configuration variable defaulting to true that controls whether to snapshot before truncate and drop.

          Show
          Jonathan Ellis added a comment - I'm fine with adding an autosnapshot configuration variable defaulting to true that controls whether to snapshot before truncate and drop.
          Hide
          Christian Spriegel added a comment -

          disableSnapshots would be great for our D/Q System, even though it wont solve the issues for my automated tests (which I fixed myself with my own patch). So I'd still appreciate it. Thanks!

          Show
          Christian Spriegel added a comment - disableSnapshots would be great for our D/Q System, even though it wont solve the issues for my automated tests (which I fixed myself with my own patch). So I'd still appreciate it. Thanks!
          Hide
          Jonathan Ellis added a comment -

          Dave, can you add an option that allows disabling automatic snapshots for drop/truncate? (but that does not block explicitly requested snapshots)

          Show
          Jonathan Ellis added a comment - Dave, can you add an option that allows disabling automatic snapshots for drop/truncate? (but that does not block explicitly requested snapshots)
          Hide
          Dave Brosius added a comment -

          against trunk

          Show
          Dave Brosius added a comment - against trunk
          Hide
          Christian Spriegel added a comment -

          Hi Dave, does this patch also disable snapshots for truncate?

          I would expect an "if" in ColumnFamilyStore.truncate() (line 1698).

          Show
          Christian Spriegel added a comment - Hi Dave, does this patch also disable snapshots for truncate? I would expect an "if" in ColumnFamilyStore.truncate() (line 1698).
          Hide
          Dave Brosius added a comment -

          add autoSnapshot option for ColumnFamilyStore.truncate as well.

          Show
          Dave Brosius added a comment - add autoSnapshot option for ColumnFamilyStore.truncate as well.
          Hide
          Jonathan Ellis added a comment -

          This is a new one for me:

          $ patch -p1 < auto_snapshot_2.diff
          missing header for unified diff at line 2 of patch
          
          Show
          Jonathan Ellis added a comment - This is a new one for me: $ patch -p1 < auto_snapshot_2.diff missing header for unified diff at line 2 of patch
          Hide
          Dave Brosius added a comment -

          sorry, try again... auto_snapshot_3.diff against trunk.

          Show
          Dave Brosius added a comment - sorry, try again... auto_snapshot_3.diff against trunk.
          Hide
          Jonathan Ellis added a comment -

          Committed, thanks.

          Out of curiosity, what was the difference in how those diffs were created?

          Show
          Jonathan Ellis added a comment - Committed, thanks. Out of curiosity, what was the difference in how those diffs were created?
          Hide
          Dave Brosius added a comment -

          Generated the same way - I'm guessing just fat fingers, on edit/review. Don't know.

          Show
          Dave Brosius added a comment - Generated the same way - I'm guessing just fat fingers, on edit/review. Don't know.

            People

            • Assignee:
              Dave Brosius
              Reporter:
              Brandon Williams
              Reviewer:
              Jonathan Ellis
            • Votes:
              6 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development