Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.0.0
    • Component/s: Core
    • Labels:

      Description

      In CASSANDRA-1608, I proposed some changes on how compaction works. I think it also makes sense to allow the ability to have pluggable compaction per CF. There could be many types of workloads where this makes sense. One example we had at Digg was to completely throw away certain SSTables after N days.

      This ticket addresses making compaction pluggable only.

      1. 0001-move-compaction-code-into-own-package.patch
        144 kB
        Alan Liang
      2. 0002-Pluggable-Compaction-and-Expiration.patch
        209 kB
        Alan Liang
      3. 0001-move-compaction-code-into-own-package.patch
        194 kB
        Alan Liang
      4. 0002-pluggable-compaction.patch
        124 kB
        Alan Liang
      5. 0001-move-compaction-code-into-own-package.patch
        193 kB
        Alan Liang
      6. 0002-pluggable-compaction.patch
        125 kB
        Alan Liang
      7. 0001-move-compaction-code-into-own-package.patch
        193 kB
        Alan Liang
      8. 0002-pluggable-compaction.patch
        125 kB
        Alan Liang
      9. 0001-move-compaction-code-into-own-package.patch
        193 kB
        Alan Liang
      10. 0002-pluggable-compaction.patch
        126 kB
        Alan Liang
      11. 0001-move-compaction-code-into-own-package.patch
        193 kB
        Alan Liang
      12. 0002-pluggable-compaction.patch
        128 kB
        Alan Liang
      13. 0002-pluggable-compaction.patch
        277 kB
        Benjamin Coverston
      14. 0002-pluggable-compaction.patch
        128 kB
        Benjamin Coverston
      15. 0001-pluggable-compaction.patch
        128 kB
        Alan Liang
      16. 0001-pluggable-compaction.patch
        130 kB
        Alan Liang
      17. 0002-rename-major-minor-to-maximal-background-in-Compacti.patch
        12 kB
        Alan Liang
      18. 0001-pluggable-compaction.patch
        130 kB
        Alan Liang
      19. 0002-rename-major-minor-to-maximal-background-in-Compacti.patch
        12 kB
        Alan Liang

        Issue Links

          Activity

          Hide
          Alan Liang added a comment -

          Some TODOs:
          -add mockito dependency to test build only
          -determine why DatabaseDescriptorTest#serDe() fails
          -validation of compaction_strategy_options
          -more tests for expiration of files

          Show
          Alan Liang added a comment - Some TODOs: -add mockito dependency to test build only -determine why DatabaseDescriptorTest#serDe() fails -validation of compaction_strategy_options -more tests for expiration of files
          Hide
          Alan Liang added a comment -

          Updated patch files.

          Show
          Alan Liang added a comment - Updated patch files.
          Hide
          Stu Hood added a comment -
          • Have the AbstractCompactionStrategy class return the default strategy for use in CFMetaData
          • createCompactionStrategyInstance should use FBUtilities.construct(class, readable)
          • Unnecessary method renames in CFMetaData
          • CompactionStrategy instantiation in DatabaseDescriptor duplicates the instantiation in CFMetaData: see what could be put into FBUtilities
          • Whitespace changes in db.ColumnFamily
          • Unnecessary ByteBufferUtil import in ColumnIndexer
          • Are you sure we can remove the major compaction file size threshold?
          • Need to special case the 'expired' directory in SSTable.tryComponentFromFilename
          • handleInsufficientSpaceForCompaction should move inside getBuckets (as mentioned in your TODOs): it would be best if the strategy logged at info/warn for files that don't fit into a bucket that matches the parameters
          • re: the TODO in doExpireCompaction: For correctness' sake, we'll need to invalidate row cache entries that match the expired files, but I would be fine doing that in a separate ticket, because it'll be a little bit involved
          • Try to remove TODOs that are speculative: if there are tasks that are blockers for this ticket, list them here. If they aren't blockers for this ticket, but are worthy tasks, they should be moved into tickets before this is committed
          • Please parse the options for TimestampBucketedCompactionStrategy in the constructor
          • One or two comments explaining the bucketing strategy for TimestampBucketed.getBuckets would be helpful
          • Methods that are public only for testing should be package protected (cf. getBuckets)
          • Seconds would make a better unit for expiration than days
          • See if you can find a way to remove some of the duplication between selectFor(Minor|Major)
          • The AbstractCompactedRow sstableStats reference should move into SSTableWriter... to collect information about a row as it is appended to the writer, you'll probably want to pass it to AbstractCompactedRow.write(file, <stats>). There is an example approach on 2319
          • useOldStatsFile should be descriptive
          • Rename SSTableStats to SSTableMetadata
          • Unnecessary imports in ByteBufferUtil
          • Regarding the disabled test in DatabaseDescriptorTest: it's probably because Maps of CharSequence will not equal one another if one contains UTF8s and the other contains Strings: see if we have another round trip test, and then consider removing that one. It's not the first time its come up

          Awesome work Alan!

          Show
          Stu Hood added a comment - Have the AbstractCompactionStrategy class return the default strategy for use in CFMetaData createCompactionStrategyInstance should use FBUtilities.construct(class, readable) Unnecessary method renames in CFMetaData CompactionStrategy instantiation in DatabaseDescriptor duplicates the instantiation in CFMetaData: see what could be put into FBUtilities Whitespace changes in db.ColumnFamily Unnecessary ByteBufferUtil import in ColumnIndexer Are you sure we can remove the major compaction file size threshold? Need to special case the 'expired' directory in SSTable.tryComponentFromFilename handleInsufficientSpaceForCompaction should move inside getBuckets (as mentioned in your TODOs): it would be best if the strategy logged at info/warn for files that don't fit into a bucket that matches the parameters re: the TODO in doExpireCompaction: For correctness' sake, we'll need to invalidate row cache entries that match the expired files, but I would be fine doing that in a separate ticket, because it'll be a little bit involved Try to remove TODOs that are speculative: if there are tasks that are blockers for this ticket, list them here. If they aren't blockers for this ticket, but are worthy tasks, they should be moved into tickets before this is committed Please parse the options for TimestampBucketedCompactionStrategy in the constructor One or two comments explaining the bucketing strategy for TimestampBucketed.getBuckets would be helpful Methods that are public only for testing should be package protected (cf. getBuckets) Seconds would make a better unit for expiration than days See if you can find a way to remove some of the duplication between selectFor(Minor|Major) The AbstractCompactedRow sstableStats reference should move into SSTableWriter... to collect information about a row as it is appended to the writer, you'll probably want to pass it to AbstractCompactedRow.write(file, <stats>). There is an example approach on 2319 useOldStatsFile should be descriptive Rename SSTableStats to SSTableMetadata Unnecessary imports in ByteBufferUtil Regarding the disabled test in DatabaseDescriptorTest: it's probably because Maps of CharSequence will not equal one another if one contains UTF8s and the other contains Strings: see if we have another round trip test, and then consider removing that one. It's not the first time its come up Awesome work Alan!
          Hide
          Sylvain Lebresne added a comment -

          Looking quickly through that code, it looks a good chunk of the code is here to support the expiring of sstables, and it's pretty much hardcoded. Isn't there a way to encapsulate that better ? All the part about collecting and keeping the max timestamp in a sstable also doesn't really strike me as making the compaction more pluggable.

          I'm not necessarily saying the TimestampBucketedCompactionStrategy is not useful (but I'm not saying it is either), but at the very least I think that what this patch does should be redefined clearly (it's not what I call making compaction pluggable, at least not just that), and I'm pretty sure it does multiple think and that I'm not necessary ok with all these things equally.

          Show
          Sylvain Lebresne added a comment - Looking quickly through that code, it looks a good chunk of the code is here to support the expiring of sstables, and it's pretty much hardcoded. Isn't there a way to encapsulate that better ? All the part about collecting and keeping the max timestamp in a sstable also doesn't really strike me as making the compaction more pluggable. I'm not necessarily saying the TimestampBucketedCompactionStrategy is not useful (but I'm not saying it is either), but at the very least I think that what this patch does should be redefined clearly (it's not what I call making compaction pluggable, at least not just that), and I'm pretty sure it does multiple think and that I'm not necessary ok with all these things equally.
          Hide
          Stu Hood added a comment -

          but at the very least I think that what this patch does should be redefined clearly (it's not what I call making compaction pluggable, at least not just that), and I'm pretty sure it does multiple think and that I'm not necessary ok with all these things equally.

          Agreed. Alan: would you mind overhauling the description of this ticket to better describe the goals?

          As usual, the problem with making something pluggable is that you need at least 2 implementations of the interface to make it worthwhile. We could separate out the TimestampBucketedCompactionStrategy and its required additions into a separate ticket, but it would be a mostly symbolic split. One split that might be reasonable (but which I'm sure Alan would prefer to avoid) would be to add TimeBucketed in this ticket, but to split out implementing expiration into a separate ticket.

          Show
          Stu Hood added a comment - but at the very least I think that what this patch does should be redefined clearly (it's not what I call making compaction pluggable, at least not just that), and I'm pretty sure it does multiple think and that I'm not necessary ok with all these things equally. Agreed. Alan: would you mind overhauling the description of this ticket to better describe the goals? As usual, the problem with making something pluggable is that you need at least 2 implementations of the interface to make it worthwhile. We could separate out the TimestampBucketedCompactionStrategy and its required additions into a separate ticket, but it would be a mostly symbolic split. One split that might be reasonable (but which I'm sure Alan would prefer to avoid) would be to add TimeBucketed in this ticket, but to split out implementing expiration into a separate ticket.
          Hide
          Alan Liang added a comment -

          "Looking quickly through that code, it looks a good chunk of the code is here to support the expiring of sstables, and it's pretty much hardcoded. Isn't there a way to encapsulate that better ?"

          You're right, it might make more sense to allow a strategy to define how it should expire the sstables.

          I'll try and fix the description. But I want to keep the implemented strategies with this ticket because they justify why the interfaces are worthwhile as Stu pointed out above.

          Show
          Alan Liang added a comment - "Looking quickly through that code, it looks a good chunk of the code is here to support the expiring of sstables, and it's pretty much hardcoded. Isn't there a way to encapsulate that better ?" You're right, it might make more sense to allow a strategy to define how it should expire the sstables. I'll try and fix the description. But I want to keep the implemented strategies with this ticket because they justify why the interfaces are worthwhile as Stu pointed out above.
          Hide
          Jonathan Ellis added a comment -

          The suggested alternative compaction strategies don't sound very generally useful. We shouldn't maintain them in-tree.

          So what we should do here is provide a CompactionStrategyProvider interface that will give us back CompactionStrategy objects implementing doCompaction, doCleanupCompaction, probably submitMinorIfNeeded, etc. We shouldn't have any provider-specific logic left in CompactionManager itself, it should all be based on the pluggable Strategy.

          Show
          Jonathan Ellis added a comment - The suggested alternative compaction strategies don't sound very generally useful. We shouldn't maintain them in-tree. So what we should do here is provide a CompactionStrategyProvider interface that will give us back CompactionStrategy objects implementing doCompaction, doCleanupCompaction, probably submitMinorIfNeeded, etc. We shouldn't have any provider-specific logic left in CompactionManager itself, it should all be based on the pluggable Strategy.
          Hide
          Stu Hood added a comment -

          The suggested alternative compaction strategies don't sound very generally useful. We shouldn't maintain them in-tree.

          Time bucketing and expiration (as implemented here) are very, very useful for timeseries data, and are in fact a blocker for production use of our timeseries systems. The requirement is that column families which store events at varying resolutions need to decay at different rates: there is no point keeping minute level resolution data indefinitely. Additionally, using TTLs is much, much too fine grained, and requires extra storage for each value.

          We shouldn't have any provider-specific logic left in CompactionManager itself, it should all be based on the pluggable Strategy.

          Agreed, but one approach that I think would be good middle ground would be to move doCompaction and doExpiration onto implementations of an AbstractCompactionTask, to be returned by the strategies that Alan has implemented. The 'task' concept already exists in this patch as an enum that CompactionManager switches on.

          The interesting methods on CompactionStrategy are selectFor(Minor|Major)Compaction, which calculate the possible tasks to perform during a minor or major compaction. For the SizeTieredStrategy (aka, the strategy implemented in trunk), selectForMinor is the same as the previous getBuckets method.


          The configuration changes in the patch are distracting, but it boils down to:

          1. Record the max client timestamp for sstables (useful for CASSANDRA-2498)
          2. Allow for a compaction strategy to choose which files to perform a particular "task" on (bucketing)
          3. Implement a "task" for expiration (N files in, 0 files out)
          4. Add a strategy that uses the max client timestamp to expire old files
          Show
          Stu Hood added a comment - The suggested alternative compaction strategies don't sound very generally useful. We shouldn't maintain them in-tree. Time bucketing and expiration (as implemented here) are very, very useful for timeseries data, and are in fact a blocker for production use of our timeseries systems. The requirement is that column families which store events at varying resolutions need to decay at different rates: there is no point keeping minute level resolution data indefinitely. Additionally, using TTLs is much, much too fine grained, and requires extra storage for each value. We shouldn't have any provider-specific logic left in CompactionManager itself, it should all be based on the pluggable Strategy. Agreed, but one approach that I think would be good middle ground would be to move doCompaction and doExpiration onto implementations of an AbstractCompactionTask, to be returned by the strategies that Alan has implemented. The 'task' concept already exists in this patch as an enum that CompactionManager switches on. The interesting methods on CompactionStrategy are selectFor(Minor|Major)Compaction, which calculate the possible tasks to perform during a minor or major compaction. For the SizeTieredStrategy (aka, the strategy implemented in trunk), selectForMinor is the same as the previous getBuckets method. The configuration changes in the patch are distracting, but it boils down to: Record the max client timestamp for sstables (useful for CASSANDRA-2498 ) Allow for a compaction strategy to choose which files to perform a particular "task" on (bucketing) Implement a "task" for expiration (N files in, 0 files out) Add a strategy that uses the max client timestamp to expire old files
          Hide
          T Jake Luciani added a comment -

          I'm also for an AbstractCompactionTask, we have specific use-cases (such has CassandraFS in brisk) that would greatly benefit from a custom strategy.

          Show
          T Jake Luciani added a comment - I'm also for an AbstractCompactionTask, we have specific use-cases (such has CassandraFS in brisk) that would greatly benefit from a custom strategy.
          Hide
          Jonathan Ellis added a comment -

          Let's keep this to "make compaction pluggable" and add extra strategies separately.

          Show
          Jonathan Ellis added a comment - Let's keep this to "make compaction pluggable" and add extra strategies separately.
          Hide
          Ryan King added a comment -

          Agreed.

          Show
          Ryan King added a comment - Agreed.
          Hide
          Alan Liang added a comment -

          2nd attempt after rebasing with trunk

          Show
          Alan Liang added a comment - 2nd attempt after rebasing with trunk
          Hide
          Alan Liang added a comment -

          rebased to trunk

          Show
          Alan Liang added a comment - rebased to trunk
          Hide
          Alan Liang added a comment - - edited

          Wanted to add a little bit more context. This ticket now only addresses pluggable compaction, I've moved the implementation of a timestamp based compaction to https://issues.apache.org/jira/browse/CASSANDRA-2735.

          This patch makes compaction pluggable in the sense that, you can implement your own AbstractCompactionStrategy. An AbstractCompactionStrategy is responsible for selecting the sstables for minor and major compaction. The strategy returns a list of AbstractCompactionTasks that are to be executed by the CompactionManager. These tasks can be regular compaction, expiration of sstables (see #2735), cleanup tasks, etc. For compaction, a strategy returns a list of CompactionTask's.

          Show
          Alan Liang added a comment - - edited Wanted to add a little bit more context. This ticket now only addresses pluggable compaction, I've moved the implementation of a timestamp based compaction to https://issues.apache.org/jira/browse/CASSANDRA-2735 . This patch makes compaction pluggable in the sense that, you can implement your own AbstractCompactionStrategy. An AbstractCompactionStrategy is responsible for selecting the sstables for minor and major compaction. The strategy returns a list of AbstractCompactionTasks that are to be executed by the CompactionManager. These tasks can be regular compaction, expiration of sstables (see #2735), cleanup tasks, etc. For compaction, a strategy returns a list of CompactionTask's.
          Hide
          Benjamin Coverston added a comment -

          These changes are good. There is an added import to the StreamingTransferTest that doesn't need to be there:

          import org.apache.cassandra.db.compaction.CompactionManager;

          Overall the changes are good, I'm concerned that a few of the tests seem to fail with this patch when they don't fail in trunk right now.

          Get the fixes in and also add some documentation for the new switches in the cassandra.yaml then a +1.

          Show
          Benjamin Coverston added a comment - These changes are good. There is an added import to the StreamingTransferTest that doesn't need to be there: import org.apache.cassandra.db.compaction.CompactionManager; Overall the changes are good, I'm concerned that a few of the tests seem to fail with this patch when they don't fail in trunk right now. Get the fixes in and also add some documentation for the new switches in the cassandra.yaml then a +1.
          Hide
          Alan Liang added a comment -

          Rebased, fixed tests, added documentation in the cli help.

          Show
          Alan Liang added a comment - Rebased, fixed tests, added documentation in the cli help.
          Hide
          Benjamin Coverston added a comment -

          Still looking at it, I'm still seeing failures on the org.apache.cassandra.db.KeyCacheTest and org.apache.cassandra.db.compaction.CompactionsPurgeTest.

          If you'er sure you're not seeing failures I'll push these changes out to another computer and test them there.

          Ben

          Show
          Benjamin Coverston added a comment - Still looking at it, I'm still seeing failures on the org.apache.cassandra.db.KeyCacheTest and org.apache.cassandra.db.compaction.CompactionsPurgeTest. If you'er sure you're not seeing failures I'll push these changes out to another computer and test them there. Ben
          Hide
          Alan Liang added a comment -

          I apologize, I uploaded the wrong diffs. This is the one.

          Show
          Alan Liang added a comment - I apologize, I uploaded the wrong diffs. This is the one.
          Hide
          Benjamin Coverston added a comment -

          CompactionManager.java has a number of duplicate imports you probably want to clean up.

          Other than that this looks good.
          +1

          Show
          Benjamin Coverston added a comment - CompactionManager.java has a number of duplicate imports you probably want to clean up. Other than that this looks good. +1
          Hide
          Alan Liang added a comment -

          Combed through the files and removed unused/duplicate imports

          Show
          Alan Liang added a comment - Combed through the files and removed unused/duplicate imports
          Hide
          Jonathan Ellis added a comment -

          Committed 01 [to 0.8 and to trunk, to make future merges from 0.8 -> trunk less painful], and also moved CompactionIterator, CompactionManager, and the CompactedRow classes into the new compaction package.

          looking at 02 now.

          Show
          Jonathan Ellis added a comment - Committed 01 [to 0.8 and to trunk, to make future merges from 0.8 -> trunk less painful] , and also moved CompactionIterator, CompactionManager, and the CompactedRow classes into the new compaction package. looking at 02 now.
          Hide
          Benjamin Coverston added a comment -

          Moved the AbstractCompactionStrategy to be an interface.

          Show
          Benjamin Coverston added a comment - Moved the AbstractCompactionStrategy to be an interface.
          Hide
          Benjamin Coverston added a comment -

          Removed extraneous .orig

          Show
          Benjamin Coverston added a comment - Removed extraneous .orig
          Hide
          Alan Liang added a comment -

          Removed updateEstimatedCompactions() from strategy since it is no longer called.

          Show
          Alan Liang added a comment - Removed updateEstimatedCompactions() from strategy since it is no longer called.
          Hide
          Jonathan Ellis added a comment -
          • I think Ben's selection of methods for the CompactionStrategy is an improvement, but I do like having an abstract class so it's obvious what the contract is for us vs having to inject parameters post-construction.
          • I'd like to move away from minor/major terms as too tied to the old compaction internals. Perhaps background/maximal instead?
          • We should also make user defined compactions part of ACS – for some strategies (e.g. leveldb) we want to be able to reject user requests that would break strategy invariants. Note that this should probably return a single Task, rather than a list. ("Maximal" will also usually return a single task, but it's cleaner to represent "nothing to do" as an empty list, than as null.)
          • handleInsufficientSpaceForCompaction is a bad encapsulation; it means both it and its caller have to deal with "find a place for an sstable." suggest leaving it up to CT.execute to deal with.

          Here's what I think ACS should end up looking like with these changes:

          /**
           * Puggable compaction strategy determines how SSTables get merged.
           *
           * There are two main goals:
           *  - perform background compaction constantly as needed; this typically makes a tradeoff between
           *    i/o done by compaction, and merging done at read time.
           *  - perform a full (maximum possible) compaction if requested by the user
           */
          public abstract class AbstractCompactionStrategy
          {
              protected final ColumnFamilyStore cfs;
              protected final Map<String, String> options;
          
              protected AbstractCompactionStrategy(ColumnFamilyStore cfs, Map<String, String> options)
              {
                  this.cfs = cfs;
                  this.options = options;
              }
          
              /**
               * @return a list of compaction tasks that should run in the background to get the sstable
               * count down to desired parameters.
               * @param gcBefore throw away tombstones older than this
               */
              public abstract List<AbstractCompactionTask> getBackgroundTasks(final int gcBefore);
          
              /**
               * @return the number of background tasks estimated to still be needed for this columnfamilystore
               */
              public abstract int getEstimatedRemainingTasks();
          
              /**
               * @return a list of compaction tasks that should be run to compact this columnfamilystore
               * as much as possible.
               * @param gcBefore throw away tombstones older than this
               */
              public abstract List<AbstractCompactionTask> getMaximalTasks(final int gcBefore);
          
              /**
               * @return a compaction task corresponding to the requested sstables
               * @param gcBefore throw away tombstones older than this
               */
              public abstract AbstractCompactionTask getUserDefinedTasks(List<SSTableReader> sstables, final int gcBefore);
          }
          
          Show
          Jonathan Ellis added a comment - I think Ben's selection of methods for the CompactionStrategy is an improvement, but I do like having an abstract class so it's obvious what the contract is for us vs having to inject parameters post-construction. I'd like to move away from minor/major terms as too tied to the old compaction internals. Perhaps background/maximal instead? We should also make user defined compactions part of ACS – for some strategies (e.g. leveldb) we want to be able to reject user requests that would break strategy invariants. Note that this should probably return a single Task, rather than a list. ("Maximal" will also usually return a single task, but it's cleaner to represent "nothing to do" as an empty list, than as null.) handleInsufficientSpaceForCompaction is a bad encapsulation; it means both it and its caller have to deal with "find a place for an sstable." suggest leaving it up to CT.execute to deal with. Here's what I think ACS should end up looking like with these changes: /** * Puggable compaction strategy determines how SSTables get merged. * * There are two main goals: * - perform background compaction constantly as needed; this typically makes a tradeoff between * i/o done by compaction, and merging done at read time. * - perform a full (maximum possible) compaction if requested by the user */ public abstract class AbstractCompactionStrategy { protected final ColumnFamilyStore cfs; protected final Map< String , String > options; protected AbstractCompactionStrategy(ColumnFamilyStore cfs, Map< String , String > options) { this .cfs = cfs; this .options = options; } /** * @ return a list of compaction tasks that should run in the background to get the sstable * count down to desired parameters. * @param gcBefore throw away tombstones older than this */ public abstract List<AbstractCompactionTask> getBackgroundTasks( final int gcBefore); /** * @ return the number of background tasks estimated to still be needed for this columnfamilystore */ public abstract int getEstimatedRemainingTasks(); /** * @ return a list of compaction tasks that should be run to compact this columnfamilystore * as much as possible. * @param gcBefore throw away tombstones older than this */ public abstract List<AbstractCompactionTask> getMaximalTasks( final int gcBefore); /** * @ return a compaction task corresponding to the requested sstables * @param gcBefore throw away tombstones older than this */ public abstract AbstractCompactionTask getUserDefinedTasks(List<SSTableReader> sstables, final int gcBefore); } Finally, can you update to conform with http://wiki.apache.org/cassandra/CodeStyle , especially the part about multiline statements?
          Hide
          Alan Liang added a comment - - edited

          I think Ben's selection of methods for the CompactionStrategy is an improvement, but I do like having an abstract class so it's obvious what the contract is for us vs having to inject parameters post-construction.

          I agree, I'll go back to the Abstract class approach.

          I'd like to move away from minor/major terms as too tied to the old compaction internals. Perhaps background/maximal instead?

          Sounds good to me.

          We should also make user defined compactions part of ACS – for some strategies (e.g. leveldb) we want to be able to reject user requests that would break strategy invariants. Note that this should probably return a single Task, rather than a list. ("Maximal" will also usually return a single task, but it's cleaner to represent "nothing to do" as an empty list, than as null.)

          Sounds good to me.

          handleInsufficientSpaceForCompaction is a bad encapsulation; it means both it and its caller have to deal with "find a place for an sstable." suggest leaving it up to CT.execute to deal with.

          Sounds good to me. So if a strategy wants to customize the behavior of handling insufficient space, they'd have to implement their own CompactionTask (or override the existing one). What do you think about that? Another thing is... since space is always a race condition, I could leave it up to the strategy to ensure the sstable it has selected has a reasonable amount of space for compaction.

          I'll resubmit a patch with all these suggestions. Thanks!

          Show
          Alan Liang added a comment - - edited I think Ben's selection of methods for the CompactionStrategy is an improvement, but I do like having an abstract class so it's obvious what the contract is for us vs having to inject parameters post-construction. I agree, I'll go back to the Abstract class approach. I'd like to move away from minor/major terms as too tied to the old compaction internals. Perhaps background/maximal instead? Sounds good to me. We should also make user defined compactions part of ACS – for some strategies (e.g. leveldb) we want to be able to reject user requests that would break strategy invariants. Note that this should probably return a single Task, rather than a list. ("Maximal" will also usually return a single task, but it's cleaner to represent "nothing to do" as an empty list, than as null.) Sounds good to me. handleInsufficientSpaceForCompaction is a bad encapsulation; it means both it and its caller have to deal with "find a place for an sstable." suggest leaving it up to CT.execute to deal with. Sounds good to me. So if a strategy wants to customize the behavior of handling insufficient space, they'd have to implement their own CompactionTask (or override the existing one). What do you think about that? Another thing is... since space is always a race condition, I could leave it up to the strategy to ensure the sstable it has selected has a reasonable amount of space for compaction. I'll resubmit a patch with all these suggestions. Thanks!
          Hide
          Alan Liang added a comment -

          new patch incorporates suggestions by jbellis, also, renamed minor/major -> background/maximal

          Show
          Alan Liang added a comment - new patch incorporates suggestions by jbellis, also, renamed minor/major -> background/maximal
          Hide
          Jonathan Ellis added a comment - - edited

          Alan, it looks like this latest patch is not based on Ben's. I see a bunch of extra methods in ACS that don't belong there, for one thing.

          Show
          Jonathan Ellis added a comment - - edited Alan, it looks like this latest patch is not based on Ben's. I see a bunch of extra methods in ACS that don't belong there, for one thing.
          Hide
          Alan Liang added a comment -

          The only difference is:
          84 public boolean isCompactionDisabled()
          89 public int getMinCompactionThreshold()
          94 public int getMaxCompactionThreshold()

          They were for convenience for the strategy implementer to have all things in one place. I'll remove.

          Show
          Alan Liang added a comment - The only difference is: 84 public boolean isCompactionDisabled() 89 public int getMinCompactionThreshold() 94 public int getMaxCompactionThreshold() They were for convenience for the strategy implementer to have all things in one place. I'll remove.
          Hide
          Jonathan Ellis added a comment -

          committed with some minor changes:

          • moved default strategy class to CFMetaData
          • removed gratuitous use of mockito

          Thanks guys!

          Show
          Jonathan Ellis added a comment - committed with some minor changes: moved default strategy class to CFMetaData removed gratuitous use of mockito Thanks guys!
          Hide
          Hudson added a comment -

          Integrated in Cassandra #920 (See https://builds.apache.org/job/Cassandra/920/)
          rename minor -> background, major -> maximal
          patch by Alan Liang for CASSANDRA-1610
          extract AbstractCompactionStrategy, AbstractCompactionTask
          patch by Alan Liang and Ben Coverston; reviewed by jbellis for CASSANDRA-1610

          jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1134461
          Files :

          • /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/TableTest.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/RemoveSuperColumnTest.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/HintedHandOffManager.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/io/sstable/SSTableReaderTest.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/OneCompactionTest.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java

          jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1134460
          Files :

          • /cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java
          • /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
          • /cassandra/trunk/build.xml
          • /cassandra/trunk/interface/cassandra.thrift
          • /cassandra/trunk/src/java/org/apache/cassandra/db/DataTracker.java
          • /cassandra/trunk/test/long/org/apache/cassandra/db/compaction/LongCompactionSpeedTest.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
          • /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/CfDef.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/cli/CliTest.java
          • /cassandra/trunk/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
          • /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/KsDef.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyTest.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
          • /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/CqlResult.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
          • /cassandra/trunk/src/avro/internode.genavro
          • /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/HintedHandOffManager.java
          • /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/CqlRow.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/streaming/StreamingTransferTest.java
          • /cassandra/trunk/test/unit/org/apache/cassandra/db/DefsTest.java
          • /cassandra/trunk/src/java/org/apache/cassandra/config/CFMetaData.java
          • /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/AbstractCompactionTask.java
          Show
          Hudson added a comment - Integrated in Cassandra #920 (See https://builds.apache.org/job/Cassandra/920/ ) rename minor -> background, major -> maximal patch by Alan Liang for CASSANDRA-1610 extract AbstractCompactionStrategy, AbstractCompactionTask patch by Alan Liang and Ben Coverston; reviewed by jbellis for CASSANDRA-1610 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1134461 Files : /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java /cassandra/trunk/test/unit/org/apache/cassandra/db/TableTest.java /cassandra/trunk/test/unit/org/apache/cassandra/db/RemoveSuperColumnTest.java /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java /cassandra/trunk/src/java/org/apache/cassandra/db/HintedHandOffManager.java /cassandra/trunk/test/unit/org/apache/cassandra/io/sstable/SSTableReaderTest.java /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/OneCompactionTest.java /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManager.java /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1134460 Files : /cassandra/trunk/src/java/org/apache/cassandra/cli/CliClient.java /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java /cassandra/trunk/build.xml /cassandra/trunk/interface/cassandra.thrift /cassandra/trunk/src/java/org/apache/cassandra/db/DataTracker.java /cassandra/trunk/test/long/org/apache/cassandra/db/compaction/LongCompactionSpeedTest.java /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/CfDef.java /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java /cassandra/trunk/test/unit/org/apache/cassandra/cli/CliTest.java /cassandra/trunk/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManager.java /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/KsDef.java /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyTest.java /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/CqlResult.java /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java /cassandra/trunk/src/avro/internode.genavro /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java /cassandra/trunk/src/java/org/apache/cassandra/db/HintedHandOffManager.java /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/CqlRow.java /cassandra/trunk/test/unit/org/apache/cassandra/streaming/StreamingTransferTest.java /cassandra/trunk/test/unit/org/apache/cassandra/db/DefsTest.java /cassandra/trunk/src/java/org/apache/cassandra/config/CFMetaData.java /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/AbstractCompactionTask.java

            People

            • Assignee:
              Alan Liang
              Reporter:
              Chris Goffinet
              Reviewer:
              Benjamin Coverston
            • Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development