Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 0.7 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      Sometimes you want to delete an entire columnfamily. If there is a lot of data, it's much faster to just insert something to the commitlog saying "truncated," and drop the memtable and data files.

      Probably should require this to block for all replicas to ack to avoid unpleasant surprises. Or make it local-only and have ops manage making sure it gets to all replicas.

      1. ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-531-add-truncate-truncateBefore.txt
        7 kB
        Jonathan Ellis
      2. ASF.LICENSE.NOT.GRANTED--0002-add-unimplemented-truncate-to-thrift-API.txt
        25 kB
        Jonathan Ellis
      3. CASSANDRA-531.patch
        69 kB
        Ran Tavory
      4. CASSANDRA-531.patch
        65 kB
        Ran Tavory
      5. CASSANDRA-531.patch
        38 kB
        Jonathan Ellis
      6. CASSANDRA-531.patch
        64 kB
        Ran Tavory

        Activity

        Hide
        Jonathan Ellis added a comment -

        This will support taking a timestamp (drop all sstables older than X).

        Show
        Jonathan Ellis added a comment - This will support taking a timestamp (drop all sstables older than X).
        Hide
        Jonathan Ellis added a comment -

        progress so far:

        02
        add (unimplemented) truncate to thrift API

        01
        CASSANDRA-531 add truncate, truncateBefore

        If someone wants to wire up the thrift api to CFS.truncate, that would be awesome.

        Let's use negative timestamp to indicate "truncate all".

        The StorageProxy method should throw UnavailableException if any replica is down, or if there is any node movement going on (any pending ranges in metadata).

        Show
        Jonathan Ellis added a comment - progress so far: 02 add (unimplemented) truncate to thrift API 01 CASSANDRA-531 add truncate, truncateBefore If someone wants to wire up the thrift api to CFS.truncate, that would be awesome. Let's use negative timestamp to indicate "truncate all". The StorageProxy method should throw UnavailableException if any replica is down, or if there is any node movement going on (any pending ranges in metadata).
        Hide
        Jonathan Ellis added a comment -

        because of how compaction works (see CASSANDRA-604 for how this can get tricky) we will want to add an option to disable compactions on a per-CF basis if we're going to allow time-based truncation (and i think we should, the efficiency win is enormous for models that need it)

        Show
        Jonathan Ellis added a comment - because of how compaction works (see CASSANDRA-604 for how this can get tricky) we will want to add an option to disable compactions on a per-CF basis if we're going to allow time-based truncation (and i think we should, the efficiency win is enormous for models that need it)
        Hide
        Jonathan Ellis added a comment -

        CASSANDRA-699 is probably a better approach for "I want to delete data older than X," so IMO truncate should focus on the "wipe everything clean," i.e., let's drop the "truncateBefore" stuff at least initially.

        It would be nice to add a Truncation "super-tombstone" to the commitlog so that we can replay and repair the Truncate in the face of node failure but I think the biggest use case here is for testing and/or playing, so I'm fine with punting on that until someone actually needs it.

        Show
        Jonathan Ellis added a comment - CASSANDRA-699 is probably a better approach for "I want to delete data older than X," so IMO truncate should focus on the "wipe everything clean," i.e., let's drop the "truncateBefore" stuff at least initially. It would be nice to add a Truncation "super-tombstone" to the commitlog so that we can replay and repair the Truncate in the face of node failure but I think the biggest use case here is for testing and/or playing, so I'm fine with punting on that until someone actually needs it.
        Hide
        Ran Tavory added a comment -

        Make code up to date with recent trunk (0.7.0 trunk)
        This patch includes: thrift and ColumnFamilyStore code (and test).
        TODO:
        1. wire up the thrift call from CassandraServer to here
        2. Do I need to also delete actual files? Have I covered all scenarios?
        3. Send the truncate operation to all nodes; make sure they are all up before doing so
        4. Add truncate to the JMX interface
        5. Add truncate to nodetool

        Show
        Ran Tavory added a comment - Make code up to date with recent trunk (0.7.0 trunk) This patch includes: thrift and ColumnFamilyStore code (and test). TODO: 1. wire up the thrift call from CassandraServer to here 2. Do I need to also delete actual files? Have I covered all scenarios? 3. Send the truncate operation to all nodes; make sure they are all up before doing so 4. Add truncate to the JMX interface 5. Add truncate to nodetool
        Hide
        Ran Tavory added a comment -

        In this patch I think I completed all the thrift wiring work, including internal messaging to all hosts in the cluster.
        The patch includes all previous changes, no need to use the previous patches.
        I've also removed the timestamp from the truncation signature since it's not used.

        TODO:
        Delete the actual files???... not sure what I should do... What's the effect of marking all sstables as compacted?
        Add truncate to the JMX interface
        Add truncate to nodetool
        Add system tests to test nodetool + JMX

        Show
        Ran Tavory added a comment - In this patch I think I completed all the thrift wiring work, including internal messaging to all hosts in the cluster. The patch includes all previous changes, no need to use the previous patches. I've also removed the timestamp from the truncation signature since it's not used. TODO: Delete the actual files???... not sure what I should do... What's the effect of marking all sstables as compacted? Add truncate to the JMX interface Add truncate to nodetool Add system tests to test nodetool + JMX
        Hide
        Ran Tavory added a comment -

        In this patch I think I completed all the thrift wiring work, including internal messaging to all hosts in the cluster.
        The patch includes all previous changes, no need to use the previous patches.
        I've also removed the timestamp from the truncation signature since it's not used.

        TODO:
        Add truncate to the JMX interface
        Add truncate to nodetool
        Add system tests to test nodetool + JMX

        Show
        Ran Tavory added a comment - In this patch I think I completed all the thrift wiring work, including internal messaging to all hosts in the cluster. The patch includes all previous changes, no need to use the previous patches. I've also removed the timestamp from the truncation signature since it's not used. TODO: Add truncate to the JMX interface Add truncate to nodetool Add system tests to test nodetool + JMX
        Hide
        Ran Tavory added a comment -

        Update patch to latest svn version (There was a change in login check methods)

        Show
        Ran Tavory added a comment - Update patch to latest svn version (There was a change in login check methods)
        Hide
        Ran Tavory added a comment -

        Remove cassandra.yaml. It wasn't intended to be here

        Show
        Ran Tavory added a comment - Remove cassandra.yaml. It wasn't intended to be here
        Hide
        Ran Tavory added a comment -

        Fix system tests.
        See,s that login was removed from API or something, now there's set_keyspace...

        Show
        Ran Tavory added a comment - Fix system tests. See,s that login was removed from API or something, now there's set_keyspace...
        Hide
        Ran Tavory added a comment -

        Added all the missing JMX parts.
        Now it's done, ready for review

        Show
        Ran Tavory added a comment - Added all the missing JMX parts. Now it's done, ready for review
        Hide
        Jonathan Ellis added a comment -

        getting 8 or 9 patch failures, can you rebase to latest trunk?

        Show
        Jonathan Ellis added a comment - getting 8 or 9 patch failures, can you rebase to latest trunk?
        Hide
        Ran Tavory added a comment -

        Update patch to revision 940507

        Show
        Ran Tavory added a comment - Update patch to revision 940507
        Hide
        Ran Tavory added a comment -

        I rebased, but that was stranges that you got so many failures. that's all the updates I got from trunk:

        ~/dev/cassandra/trunk $ svn up
        U build.xml
        Updated to revision 940507.

        Try now?

        Show
        Ran Tavory added a comment - I rebased, but that was stranges that you got so many failures. that's all the updates I got from trunk: ~/dev/cassandra/trunk $ svn up U build.xml Updated to revision 940507. Try now?
        Hide
        Jonathan Ellis added a comment - - edited

        Patch applies cleanly now.

        Sorry to throw this back again, but can you submit a patch that doesn't mess with whitespace on otherwise-unedited lines?

        (cleaning up whitespace in a separate patch is good, but mixing it in with "real" changes pollutes the patch.)

        also, if you can conform to http://wiki.apache.org/cassandra/CodeStyle even where it's idiosyncratic, it helps keep the code base uniform. in particular, argument alignment, use of public final fields, and throwing RuntimeException instead of logging an error in catch statements.

        Thanks!

        Show
        Jonathan Ellis added a comment - - edited Patch applies cleanly now. Sorry to throw this back again, but can you submit a patch that doesn't mess with whitespace on otherwise-unedited lines? (cleaning up whitespace in a separate patch is good, but mixing it in with "real" changes pollutes the patch.) also, if you can conform to http://wiki.apache.org/cassandra/CodeStyle even where it's idiosyncratic, it helps keep the code base uniform. in particular, argument alignment, use of public final fields, and throwing RuntimeException instead of logging an error in catch statements. Thanks!
        Hide
        Ran Tavory added a comment -

        Code cleanup to confirm to the coding guidelines.

        Show
        Ran Tavory added a comment - Code cleanup to confirm to the coding guidelines.
        Hide
        Ran Tavory added a comment -

        sorry for the mess, code is cleaned up and all whitespace changes are removed.

        Show
        Ran Tavory added a comment - sorry for the mess, code is cleaned up and all whitespace changes are removed.
        Hide
        Jonathan Ellis added a comment -

        Attached is a version with a few changes. The major one from a functionality perspective is, we rely on the markCompacted to delete the files eventually. Manually deleting them ahead of that can cause errors on concurrent reads.

        Two things still need work. One is, we don't want to delete data that we didn't snapshot; that could be a nasty surprise. I sketched out what I think that could look like; SSTableReader.newSince is left as a TODO.

        The other is minor – truncate from cmdline should go in clustertool, not nodetool. (Nodetool is for per-node ops.)

        Show
        Jonathan Ellis added a comment - Attached is a version with a few changes. The major one from a functionality perspective is, we rely on the markCompacted to delete the files eventually. Manually deleting them ahead of that can cause errors on concurrent reads. Two things still need work. One is, we don't want to delete data that we didn't snapshot; that could be a nasty surprise. I sketched out what I think that could look like; SSTableReader.newSince is left as a TODO. The other is minor – truncate from cmdline should go in clustertool, not nodetool. (Nodetool is for per-node ops.)
        Hide
        Jonathan Ellis added a comment -

        Oh, also the commitlog context stuff is redundant since snapshot flushes, and flush already marks the commitlog "don't bother replaying stuff before this" and cleans up obsolete segments. I removed the CommitLog method calls, but I didn't r/m the ctx creation, so that can go too.

        Show
        Jonathan Ellis added a comment - Oh, also the commitlog context stuff is redundant since snapshot flushes, and flush already marks the commitlog "don't bother replaying stuff before this" and cleans up obsolete segments. I removed the CommitLog method calls, but I didn't r/m the ctx creation, so that can go too.
        Hide
        Ran Tavory added a comment -

        Moving the truncate operation from nodetool to cluster tool was easy enough.
        I remove the unused ctx as well.

        I don't know how to implement the SSTableReader.newSince (or SSTable.newSince). need help here...

        I'll upload the patch with my recent changes as mentioned above.

        Show
        Ran Tavory added a comment - Moving the truncate operation from nodetool to cluster tool was easy enough. I remove the unused ctx as well. I don't know how to implement the SSTableReader.newSince (or SSTable.newSince). need help here... I'll upload the patch with my recent changes as mentioned above.
        Hide
        Ran Tavory added a comment -

        Move truncate from nodetool to clustertool
        remove unused ctx variable

        Show
        Ran Tavory added a comment - Move truncate from nodetool to clustertool remove unused ctx variable
        Hide
        Jonathan Ellis added a comment -

        in my mind newsince looks like this:

        SSTable gets a new field to tell us "i don't have any data newer than this this epoch." maxDataAge maybe?

        when flushing, this gets set to CurrentTimeMillis. When compacting, it gets set to max(maxDataAge) of any of the compaction sources.

        For truncate's purposes, this field doesn't need to be serialized to disk – take the file's time on disk as maxDataAge when loading.

        so newSince would just return maxDataAge >= truncateBegan.

        does that make sense?

        Show
        Jonathan Ellis added a comment - in my mind newsince looks like this: SSTable gets a new field to tell us "i don't have any data newer than this this epoch." maxDataAge maybe? when flushing, this gets set to CurrentTimeMillis. When compacting, it gets set to max(maxDataAge) of any of the compaction sources. For truncate's purposes, this field doesn't need to be serialized to disk – take the file's time on disk as maxDataAge when loading. so newSince would just return maxDataAge >= truncateBegan. does that make sense?
        Hide
        Ran Tavory added a comment -

        Added the maxDataAge and newSince in SSTableReader, so it's ready for another review

        Show
        Ran Tavory added a comment - Added the maxDataAge and newSince in SSTableReader, so it's ready for another review
        Hide
        Jonathan Ellis added a comment -

        committed, with some changes to make maxDataAge final, and the addition of a 2nd flush to truncate(). so the start of truncate now looks like

                // snapshot will also flush, but we want to truncate the most possible, and anything in a flush written
                // after truncateAt won't be truncated.
                try
                {
                    forceBlockingFlush();
                }
                catch (Exception e)
                {
                    throw new RuntimeException(e);
                }
        
                final long truncatedAt = System.currentTimeMillis();
                snapshot(Table.getTimestampedSnapshotName("before-truncate"));
        
        Show
        Jonathan Ellis added a comment - committed, with some changes to make maxDataAge final, and the addition of a 2nd flush to truncate(). so the start of truncate now looks like // snapshot will also flush, but we want to truncate the most possible, and anything in a flush written // after truncateAt won't be truncated. try { forceBlockingFlush(); } catch (Exception e) { throw new RuntimeException(e); } final long truncatedAt = System .currentTimeMillis(); snapshot(Table.getTimestampedSnapshotName( "before-truncate" ));

          People

          • Assignee:
            Ran Tavory
            Reporter:
            Jonathan Ellis
          • Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development