Cassandra
  1. Cassandra
  2. CASSANDRA-4049

Add generic way of adding SSTable components required custom compaction strategy

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 2
    • Component/s: Core
    • Labels:

      Description

      CFS compaction strategy coming up in the next DSE release needs to store some important information in Tombstones.db and RemovedKeys.db files, one per sstable. However, currently Cassandra issues warnings when these files are found in the data directory. Additionally, when switched to SizeTieredCompactionStrategy, the files are left in the data directory after compaction.

      The attached patch adds new components to the Component class so Cassandra knows about those files.

      1. pluggable_custom_components-1.1.5.patch
        16 kB
        Piotr Kołaczkowski
      2. pluggable_custom_components-1.1.5-2.patch
        17 kB
        Piotr Kołaczkowski
      3. 4049-v3.txt
        13 kB
        Jonathan Ellis
      4. 4049-v4.txt
        13 kB
        Piotr Kołaczkowski
      5. 4049-v5.txt
        13 kB
        Piotr Kołaczkowski

        Activity

        Hide
        Stu Hood added a comment -

        I'd prefer that if different compaction strategies are going to require different components, we allow them to specify their components in a pluggable fashion, so that we can appropriately clean up one strategy and apply another, rather than leaving the files around unnecessarily.

        Show
        Stu Hood added a comment - I'd prefer that if different compaction strategies are going to require different components, we allow them to specify their components in a pluggable fashion, so that we can appropriately clean up one strategy and apply another, rather than leaving the files around unnecessarily.
        Hide
        Sylvain Lebresne added a comment -

        Agreed with Stu.

        Show
        Sylvain Lebresne added a comment - Agreed with Stu.
        Hide
        Piotr Kołaczkowski added a comment -

        If it could be made pluggable, it would be great!

        Show
        Piotr Kołaczkowski added a comment - If it could be made pluggable, it would be great!
        Hide
        Piotr Kołaczkowski added a comment -

        I looked into the code and I found a chicken-and-egg problem.
        List of components is associated with each sstable. SSTables have to be loaded before the compaction strategy. If compaction strategy specifies a set of its own additional components, then this list is not accessible at the sstable load time.

        So perhaps the list of sstable components should be kept with sstable itself as an additional component? Let's call it table-of-contents file. Then when opening the sstable, it could be easy to get all the components, irrespective of what really created them.

        What do you think about it?

        Show
        Piotr Kołaczkowski added a comment - I looked into the code and I found a chicken-and-egg problem. List of components is associated with each sstable. SSTables have to be loaded before the compaction strategy. If compaction strategy specifies a set of its own additional components, then this list is not accessible at the sstable load time. So perhaps the list of sstable components should be kept with sstable itself as an additional component? Let's call it table-of-contents file. Then when opening the sstable, it could be easy to get all the components, irrespective of what really created them. What do you think about it?
        Hide
        Piotr Kołaczkowski added a comment -

        I attach a different patch implementing a different, more generic approach:
        1. Files containing .aux extension are treated as "auxiliary" files and are not warned about by cassandra.
        2. Compaction strategies have cleanup method that is called when strategy is changed. Therefore the CS can cleanup its files.

        Show
        Piotr Kołaczkowski added a comment - I attach a different patch implementing a different, more generic approach: 1. Files containing .aux extension are treated as "auxiliary" files and are not warned about by cassandra. 2. Compaction strategies have cleanup method that is called when strategy is changed. Therefore the CS can cleanup its files.
        Hide
        Jonathan Ellis added a comment -

        I'm a little nervous about this. Having descriptors know about only some components feels fragile and likely to cause bugs at some point.

        What if we added Custom1..Custom5 Type entries so that we're not just ignoring them?

        Show
        Jonathan Ellis added a comment - I'm a little nervous about this. Having descriptors know about only some components feels fragile and likely to cause bugs at some point. What if we added Custom1..Custom5 Type entries so that we're not just ignoring them?
        Hide
        Piotr Kołaczkowski added a comment -

        Custom1..Custom5 types would be ok, but we'd like they can at least get meaningful names in the DSE code and in the data-directory. It can pretty fast bite us, if there are files named sblocks-custom1.dat, sblocks-custom2.dat, etc - the name doesn't tell anything what is inside.

        Show
        Piotr Kołaczkowski added a comment - Custom1..Custom5 types would be ok, but we'd like they can at least get meaningful names in the DSE code and in the data-directory. It can pretty fast bite us, if there are files named sblocks-custom1.dat, sblocks-custom2.dat, etc - the name doesn't tell anything what is inside.
        Hide
        Jonathan Ellis added a comment -

        I'm not sure how you can give them meaningful names while keeping it pluggable and not tied to DSE, but I'm open to suggestions.

        Show
        Jonathan Ellis added a comment - I'm not sure how you can give them meaningful names while keeping it pluggable and not tied to DSE, but I'm open to suggestions.
        Hide
        Piotr Kołaczkowski added a comment -

        Hmm, I'm also not sure how to do that, but giving them custom names is important. What if we want to provide two compaction strategies using custom components instead of one? What if user wants to switch from one strategy to another one? We'd have to be very careful that they don't use the same custom component for storing different things. So it would be also quite fragile.

        Show
        Piotr Kołaczkowski added a comment - Hmm, I'm also not sure how to do that, but giving them custom names is important. What if we want to provide two compaction strategies using custom components instead of one? What if user wants to switch from one strategy to another one? We'd have to be very careful that they don't use the same custom component for storing different things. So it would be also quite fragile.
        Hide
        Piotr Kołaczkowski added a comment -

        If Component.Type was not a static enum, we could provide an API to register new component types. So DSE could invoke that API on startup, register all components required for its custom compaction strategies, and then C* would know about all the components, and we could use components with whatever names we like. If you like the idea, I can do it.

        Show
        Piotr Kołaczkowski added a comment - If Component.Type was not a static enum, we could provide an API to register new component types. So DSE could invoke that API on startup, register all components required for its custom compaction strategies, and then C* would know about all the components, and we could use components with whatever names we like. If you like the idea, I can do it.
        Hide
        Piotr Kołaczkowski added a comment -

        Another version of the patch:

        • no code is tied to DSE-related stuff
        • C* knows about all the sstable components and can do the cleanup by itself
        • custom components (e.g. the ones added by DSE) are of type Type.CUSTOM
        • custom components can have meaningful names
        Show
        Piotr Kołaczkowski added a comment - Another version of the patch: no code is tied to DSE-related stuff C* knows about all the sstable components and can do the cleanup by itself custom components (e.g. the ones added by DSE) are of type Type.CUSTOM custom components can have meaningful names
        Hide
        Jonathan Ellis added a comment -

        Sorry for the delay, would you mind rebasing to current trunk?

        Show
        Jonathan Ellis added a comment - Sorry for the delay, would you mind rebasing to current trunk?
        Hide
        Piotr Kołaczkowski added a comment -

        Ok, rebased. Actually only line numbers changed, there was no conflict.

        Show
        Piotr Kołaczkowski added a comment - Ok, rebased. Actually only line numbers changed, there was no conflict.
        Hide
        Jonathan Ellis added a comment -
        .       catch (Exception e)
                {
                    if (!"snapshots".equals(name) && !"backups".equals(name)
                            && !name.contains(".json"))
                        logger.warn("Invalid file '{}' in data directory {}.", name, dir);
                    return null;
                }
        

        What was the reasoning behind this? Not saying it's wrong to remove it, but I want to make sure we understand what it was supposed to do, before deciding we don't need it.

        similarly,

        .       catch (IOException e)
                {
                    Set<Component> components = Sets.newHashSetWithExpectedSize(Component.TYPES.size());
                    for (Component.Type componentType : Component.TYPES)
                    {
                        Component component = new Component(componentType);
                        if (new File(desc.filenameFor(component)).exists())
                            components.add(component);
                    }
        
                    saveTOC(desc, components);
                    return components;
                }
        

        what are we trying to recover from here? ISTM that at the least we should log an error instead of silently skipping over damage.

                    try
                    {
                        if (r != null)
                            r.close();
                    }
                    catch (IOException e)
                    {
                    }
        

        Use FileUtils.closeQuietly. But probably simpler to just use Guava's Files.readLines.

        .       catch (IOException e)
                {
                    logger.error("Error saving TOC for sstable: " + descriptor, e);
                }
        

        Looks to me like we should throw IOError since this Shouldn't Happen.

            public synchronized void addCustomComponent(Component component)
        

        Do we not know what components are necessary at construction time? Would strongly prefer an immutable set. Adding extra parameters to SSTW to do this is fine.

        If we must make it mutable, prefer e.g. CopyOnWriteArraySet to using synchronized. (Otherwise synchronizing write access is not enough for safety, must also synchronize reads.)

        Show
        Jonathan Ellis added a comment - . catch (Exception e) { if (!"snapshots".equals(name) && !"backups".equals(name) && !name.contains(".json")) logger.warn("Invalid file '{}' in data directory {}.", name, dir); return null; } What was the reasoning behind this? Not saying it's wrong to remove it, but I want to make sure we understand what it was supposed to do, before deciding we don't need it. similarly, . catch (IOException e) { Set<Component> components = Sets.newHashSetWithExpectedSize(Component.TYPES.size()); for (Component.Type componentType : Component.TYPES) { Component component = new Component(componentType); if (new File(desc.filenameFor(component)).exists()) components.add(component); } saveTOC(desc, components); return components; } what are we trying to recover from here? ISTM that at the least we should log an error instead of silently skipping over damage. try { if (r != null) r.close(); } catch (IOException e) { } Use FileUtils.closeQuietly. But probably simpler to just use Guava's Files.readLines. . catch (IOException e) { logger.error("Error saving TOC for sstable: " + descriptor, e); } Looks to me like we should throw IOError since this Shouldn't Happen. public synchronized void addCustomComponent(Component component) Do we not know what components are necessary at construction time? Would strongly prefer an immutable set. Adding extra parameters to SSTW to do this is fine. If we must make it mutable, prefer e.g. CopyOnWriteArraySet to using synchronized. (Otherwise synchronizing write access is not enough for safety, must also synchronize reads.)
        Hide
        Piotr Kołaczkowski added a comment - - edited

        Sorry for late reply, been busy fixing things in dse.

        catch (Exception e)
                {
                    if (!"snapshots".equals(name) && !"backups".equals(name)
                            && !name.contains(".json"))
                        logger.warn("Invalid file '{}' in data directory {}.", name, dir);
                    return null;
                }
        

        What was the reasoning behind this? Not saying it's wrong to remove it, but I want to make sure we understand what it was supposed to do, before deciding we don't need it.

        -This check was there before. Actually because this check was firing out warnings, we created this ticket -
        Oh, just noticed, you are referring to removed code. Sylvain Lebresne said he could live without them. But if you insist, I can think of bringing them back. Don't know how to do it yet, but if it is important, I can try.

        On the other hand, I left the warning about missing components (which is IMHO much more important than some spurious components):

        if (!new File(descriptor.filenameFor(component)).exists())
                            logger.error("Missing component: " + descriptor.filenameFor(component));
        

        -----------------

        catch (IOException e)
                {
                    Set<Component> components = Sets.newHashSetWithExpectedSize(Component.TYPES.size());
                    for (Component.Type componentType : Component.TYPES)
                    {
                        Component component = new Component(componentType);
                        if (new File(desc.filenameFor(component)).exists())
                            components.add(component);
                    }
        
                    saveTOC(desc, components);
                    return components;
                }
        

        This one is for backwards compatibility. If we find an SSTable without a TOC component (from previous version of C*), we just do what we always did - loop through all C* components.

        --------------------


        Use FileUtils.closeQuietly

        Oh, yeah. I was looking for IOUtils.closeQuietly, and couldn't find it. Thanks, that is what I needed!


        But probably simpler to just use Guava's Files.readLines

        Ok, I fix it.

        --------------------


        Do we not know what components are necessary at construction time? Would strongly prefer an immutable set. Adding extra parameters to SSTW to do this is fine.

        We do, but there is a chicken-and-egg problem here. CompactionStrategy knows. But CompactionStrategy needs a set of SSTables to be created. And to create SSTable readers you need to know the components. That is why I decided for a TOC component, that allows for reading the list of components at SSTable construction time.

        The workflow of creating a new SSTable is currently as follows:

        1. memtable is flushed to disk, with C* only components
        2. compaction strategy is notified that a new sstable was created and gets an SSTableReader (with only default components)
        3. compaction strategy adds its custom components to it; in order to do it, it has to read some components of SSTable (e.g. access the index or data file)

        In order to make SSTableReader immutable, we had to ask currently installed compaction strategy for custom component list somewhere in the middle of this process, before creating SSTableReader. That is slightly more complex than what we have now (have to change the CS interface), but retaining full immutability is probably worth it.

        public synchronized void addCustomComponent(Component component)

        You are right that synchronized is wrong here.

        Thanks for great suggestions Jonathan Ellis. I look into improving my patch as soon as I'm done with the tickets I've got in the waiting queue for DSE 3.0.

        Show
        Piotr Kołaczkowski added a comment - - edited Sorry for late reply, been busy fixing things in dse. catch (Exception e) { if (!"snapshots".equals(name) && !"backups".equals(name) && !name.contains(".json")) logger.warn("Invalid file '{}' in data directory {}.", name, dir); return null; } What was the reasoning behind this? Not saying it's wrong to remove it, but I want to make sure we understand what it was supposed to do, before deciding we don't need it. - This check was there before. Actually because this check was firing out warnings, we created this ticket - Oh, just noticed, you are referring to removed code. Sylvain Lebresne said he could live without them. But if you insist, I can think of bringing them back. Don't know how to do it yet, but if it is important, I can try. On the other hand, I left the warning about missing components (which is IMHO much more important than some spurious components): if (!new File(descriptor.filenameFor(component)).exists()) logger.error("Missing component: " + descriptor.filenameFor(component)); ----------------- catch (IOException e) { Set<Component> components = Sets.newHashSetWithExpectedSize(Component.TYPES.size()); for (Component.Type componentType : Component.TYPES) { Component component = new Component(componentType); if (new File(desc.filenameFor(component)).exists()) components.add(component); } saveTOC(desc, components); return components; } This one is for backwards compatibility. If we find an SSTable without a TOC component (from previous version of C*), we just do what we always did - loop through all C* components. -------------------- Use FileUtils.closeQuietly Oh, yeah. I was looking for IOUtils.closeQuietly, and couldn't find it. Thanks, that is what I needed! But probably simpler to just use Guava's Files.readLines Ok, I fix it. -------------------- Do we not know what components are necessary at construction time? Would strongly prefer an immutable set. Adding extra parameters to SSTW to do this is fine. We do, but there is a chicken-and-egg problem here. CompactionStrategy knows. But CompactionStrategy needs a set of SSTables to be created. And to create SSTable readers you need to know the components. That is why I decided for a TOC component, that allows for reading the list of components at SSTable construction time. The workflow of creating a new SSTable is currently as follows: 1. memtable is flushed to disk, with C* only components 2. compaction strategy is notified that a new sstable was created and gets an SSTableReader (with only default components) 3. compaction strategy adds its custom components to it; in order to do it, it has to read some components of SSTable (e.g. access the index or data file) In order to make SSTableReader immutable, we had to ask currently installed compaction strategy for custom component list somewhere in the middle of this process, before creating SSTableReader. That is slightly more complex than what we have now (have to change the CS interface), but retaining full immutability is probably worth it. public synchronized void addCustomComponent(Component component) You are right that synchronized is wrong here. Thanks for great suggestions Jonathan Ellis . I look into improving my patch as soon as I'm done with the tickets I've got in the waiting queue for DSE 3.0.
        Hide
        Piotr Kołaczkowski added a comment -

        Improved patch:

        1. Code style - now uses FileUtils, Guava.readLines
        2. Appends TOC instead of overwriting it.
        3. Access to TOC file is protected with ReadWriteLock.
        4. Components collection is a CopyOnWriteArraySet.
        5. No synchronized.

        I really tried to make components collection immutable first, but that opened unfortunately a whole can of worms related to:
        1. SSTable reference counting (two SSTableReader objects sharing data) and deletion
        2. Adding custom components from inside of a notifier (e.g. on memtable flush)
        3. Rebuilding interval tree (that one was easy to fix, though)

        Just didn't want to introduce subtle bugs. SSTable and SSTableReader aren't immutable anyway.

        I performed stress testing with Cassandra FileSystem stress test tool in DSE, and our custom CFS strategy - works fine.

        Show
        Piotr Kołaczkowski added a comment - Improved patch: 1. Code style - now uses FileUtils, Guava.readLines 2. Appends TOC instead of overwriting it. 3. Access to TOC file is protected with ReadWriteLock. 4. Components collection is a CopyOnWriteArraySet. 5. No synchronized. I really tried to make components collection immutable first, but that opened unfortunately a whole can of worms related to: 1. SSTable reference counting (two SSTableReader objects sharing data) and deletion 2. Adding custom components from inside of a notifier (e.g. on memtable flush) 3. Rebuilding interval tree (that one was easy to fix, though) Just didn't want to introduce subtle bugs. SSTable and SSTableReader aren't immutable anyway. I performed stress testing with Cassandra FileSystem stress test tool in DSE, and our custom CFS strategy - works fine.
        Hide
        Piotr Kołaczkowski added a comment - - edited

        Next version of the patch:

        • addCustomComponents marked as public API
        • simplified discoverComponentsFor loop
        • fileLocking removed from SSTable
        • added detection for concurrent access or lightweight file locking (just replace lockOrFail calls with lock if you want locking back)
        Show
        Piotr Kołaczkowski added a comment - - edited Next version of the patch: addCustomComponents marked as public API simplified discoverComponentsFor loop fileLocking removed from SSTable added detection for concurrent access or lightweight file locking (just replace lockOrFail calls with lock if you want locking back)
        Hide
        Jonathan Ellis added a comment -

        How about this in v3 (attached)?

        It's easy to demonstrate by inspection that the "native" Cassandra code doesn't need to lock the TOC (since it isn't modified once created). So I just synchronize the addComponents method.

        Show
        Jonathan Ellis added a comment - How about this in v3 (attached)? It's easy to demonstrate by inspection that the "native" Cassandra code doesn't need to lock the TOC (since it isn't modified once created). So I just synchronize the addComponents method.
        Hide
        Piotr Kołaczkowski added a comment -

        Jonathan Ellis Looks and works good.

        I added a guard against adding the same component more than once (just a cosmetic thing to avoid duplicate TOC entries, because components is a set anyway). Attached as 4049-v4.txt.

        You can go on and merge it into 1.1.7.

        Show
        Piotr Kołaczkowski added a comment - Jonathan Ellis Looks and works good. I added a guard against adding the same component more than once (just a cosmetic thing to avoid duplicate TOC entries, because components is a set anyway). Attached as 4049-v4.txt. You can go on and merge it into 1.1.7.
        Hide
        Jonathan Ellis added a comment -

        LGTM, committed to trunk. (Unfortunately we're past the point where we can risk regressions in 1.1.)

        Show
        Jonathan Ellis added a comment - LGTM, committed to trunk. (Unfortunately we're past the point where we can risk regressions in 1.1.)
        Hide
        Jonathan Ellis added a comment -

        Had to revert from trunk since it caused SSTablesTest failure. Probably something simple but didn't have time to investigate. 5eb9e1c1576edb90f5b6e2ef975686e87a8c93af is the rebased-to-trunk commit.

        Show
        Jonathan Ellis added a comment - Had to revert from trunk since it caused SSTablesTest failure. Probably something simple but didn't have time to investigate. 5eb9e1c1576edb90f5b6e2ef975686e87a8c93af is the rebased-to-trunk commit.
        Hide
        Piotr Kołaczkowski added a comment -

        Ok, I take a look at it.

        Show
        Piotr Kołaczkowski added a comment - Ok, I take a look at it.
        Hide
        Piotr Kołaczkowski added a comment -

        Fixed the patch so it doesn't break tests - there is no need to create TOC if the SSTable doesn't exist at all.

        diff between the previous patch:

        119c119
        < index 9a29066..b000d2f 100644
        ---
        > index 9a29066..e84f561 100644
        157c157
        < @@ -175,36 +178,51 @@ public abstract class SSTable
        ---
        > @@ -175,36 +178,49 @@ public abstract class SSTable
        184,185d183
        < +                if (components.isEmpty())
        < +                    return components;
        223c221
        < @@ -253,4 +271,64 @@ public abstract class SSTable
        ---
        > @@ -253,4 +269,64 @@ public abstract class SSTable
        
        Show
        Piotr Kołaczkowski added a comment - Fixed the patch so it doesn't break tests - there is no need to create TOC if the SSTable doesn't exist at all. diff between the previous patch: 119c119 < index 9a29066..b000d2f 100644 --- > index 9a29066..e84f561 100644 157c157 < @@ -175,36 +178,51 @@ public abstract class SSTable --- > @@ -175,36 +178,49 @@ public abstract class SSTable 184,185d183 < + if (components.isEmpty()) < + return components; 223c221 < @@ -253,4 +271,64 @@ public abstract class SSTable --- > @@ -253,4 +269,64 @@ public abstract class SSTable
        Hide
        Jonathan Ellis added a comment -

        patch does not apply; cherry-picked 5eb9e1c1576edb90f5b6e2ef975686e87a8c93af and added an isEmpty check.

        Show
        Jonathan Ellis added a comment - patch does not apply; cherry-picked 5eb9e1c1576edb90f5b6e2ef975686e87a8c93af and added an isEmpty check.

          People

          • Assignee:
            Piotr Kołaczkowski
            Reporter:
            Piotr Kołaczkowski
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development