Cassandra
  1. Cassandra
  2. CASSANDRA-556

nodeprobe snapshot to support specific column families

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.1.1
    • Component/s: Core
    • Labels:

      Description

      It would be good to support dumping specific column families via nodeprobe for backup purposes.

      In my particular case the majority of cassandra data doesn't need to be backed up except for a couple of column families containing user settings / profiles etc.

      1. cf_snapshots_556.diff
        10 kB
        Dave Brosius
      2. cf_snapshots_556_2A.diff
        12 kB
        Dave Brosius
      3. cf_snapshots_556_2.diff
        11 kB
        Dave Brosius

        Activity

        Hide
        Victor Z. Peng added a comment -

        Just to get clarified, does this mean dumping specific column families for specific tables? So the 'snapshot' command will take both table names and column familie names.

        Show
        Victor Z. Peng added a comment - Just to get clarified, does this mean dumping specific column families for specific tables? So the 'snapshot' command will take both table names and column familie names.
        Hide
        Jonathan Ellis added a comment -

        right

        Show
        Jonathan Ellis added a comment - right
        Hide
        Victor Z. Peng added a comment -

        Hi, Jonathan. I want to work on this bug, but I couldn't find where to file a ticket. Could you give me some guideline? Thank you.

        Show
        Victor Z. Peng added a comment - Hi, Jonathan. I want to work on this bug, but I couldn't find where to file a ticket. Could you give me some guideline? Thank you.
        Hide
        Jonathan Ellis added a comment -

        This is the only ticketing system we use. Maybe I'm not understanding the question.

        Show
        Jonathan Ellis added a comment - This is the only ticketing system we use. Maybe I'm not understanding the question.
        Hide
        Victor Z. Peng added a comment -

        sorry. what i mean is, how to get this bug assigned to me?

        Show
        Victor Z. Peng added a comment - sorry. what i mean is, how to get this bug assigned to me?
        Hide
        Jonathan Ellis added a comment -

        An admin has to edit the Contributor list, but in the meantime, commenting here as you did is sufficient.

        Show
        Jonathan Ellis added a comment - An admin has to edit the Contributor list, but in the meantime, commenting here as you did is sufficient.
        Hide
        Victor Z. Peng added a comment -

        It seems the implementation is not very hard, we simply just add more helper functions that could handle the request to snapshot a subset of columns, since org.apache.cassandra.db.Table#snapshot is snapshotting columns one by one already.

        One thing's worth discussing is how we support specifying both multi-tables and multi-columns simultaneously via NodeTool in CLI.

        What i propose is we add a '-c' option, which will accept an argument as column names. Since an option can accept at most on argument, the argument can be formatted as comma separated string: "-c col1,col2,col3". When -c is absent, we default it to all columns. Does this sound ok?

        Show
        Victor Z. Peng added a comment - It seems the implementation is not very hard, we simply just add more helper functions that could handle the request to snapshot a subset of columns, since org.apache.cassandra.db.Table#snapshot is snapshotting columns one by one already. One thing's worth discussing is how we support specifying both multi-tables and multi-columns simultaneously via NodeTool in CLI. What i propose is we add a '-c' option, which will accept an argument as column names. Since an option can accept at most on argument, the argument can be formatted as comma separated string: "-c col1,col2,col3". When -c is absent, we default it to all columns. Does this sound ok?
        Hide
        Jonathan Ellis added a comment -

        I think it would be more consistent w/ the other options to just add an optional (single) columnfamily argument at the end. If you want to snapshot more than one you can always issue multiple snapshot commands.

        Show
        Jonathan Ellis added a comment - I think it would be more consistent w/ the other options to just add an optional (single) columnfamily argument at the end. If you want to snapshot more than one you can always issue multiple snapshot commands.
        Hide
        Victor Z. Peng added a comment -

        You are right! I want to start with this fix now. My first fix for Cassandra

        3 more questions:
        Do I have to start after I have been assigned to this bug? Or just write and submit?
        No tests for NodeTool?
        NodeTool Wiki page not updated.

        Show
        Victor Z. Peng added a comment - You are right! I want to start with this fix now. My first fix for Cassandra 3 more questions: Do I have to start after I have been assigned to this bug? Or just write and submit? No tests for NodeTool? NodeTool Wiki page not updated.
        Hide
        Jonathan Ellis added a comment -

        Right.

        Show
        Jonathan Ellis added a comment - Right.
        Hide
        Victor Z. Peng added a comment -

        I assume we don't need to support specific column family for CLEARSNAPSHOT? It's harder to implement and I think we don't have a use case in practice at the moment?

        Show
        Victor Z. Peng added a comment - I assume we don't need to support specific column family for CLEARSNAPSHOT? It's harder to implement and I think we don't have a use case in practice at the moment?
        Hide
        Jonathan Ellis added a comment -

        Sounds reasonable.

        Show
        Jonathan Ellis added a comment - Sounds reasonable.
        Hide
        Jonathan Ellis added a comment -

        Could you have a look at this, Dave?

        Show
        Jonathan Ellis added a comment - Could you have a look at this, Dave?
        Hide
        Dave Brosius added a comment -

        Added a new command

        snapshot_columnfamily keyspace columnfamily {-t snapshotname}

        because hijacking the existing snapshot command is problematic, because

        1) you can specify 1-n keyspaces so disambiquating what is the keyspace and what is the column family is difficult.

        2) a column family name could exist in multiple keyspaces.

        applied in trunk

        Show
        Dave Brosius added a comment - Added a new command snapshot_columnfamily keyspace columnfamily {-t snapshotname} because hijacking the existing snapshot command is problematic, because 1) you can specify 1-n keyspaces so disambiquating what is the keyspace and what is the column family is difficult. 2) a column family name could exist in multiple keyspaces. applied in trunk
        Hide
        Jonathan Ellis added a comment -

        Thanks, Dave!

        I think it would be good to split up the method calls at the JMX level as well, since it doesn't really make sense to apply a specific CF name AND multiple keyspaces at the same time. What do you think?

        Nit: help in nodecommand adds a second line for "snapshot" instead of "snapshot_columnfamily"

        Show
        Jonathan Ellis added a comment - Thanks, Dave! I think it would be good to split up the method calls at the JMX level as well, since it doesn't really make sense to apply a specific CF name AND multiple keyspaces at the same time. What do you think? Nit: help in nodecommand adds a second line for "snapshot" instead of "snapshot_columnfamily"
        Hide
        Dave Brosius added a comment -

        Sure that's fine, i'll fix tonite. Just wanted to make sure folks were ok with splitting the command as it is.

        Show
        Dave Brosius added a comment - Sure that's fine, i'll fix tonite. Just wanted to make sure folks were ok with splitting the command as it is.
        Hide
        Jonathan Ellis added a comment -

        Just wanted to make sure folks were ok with splitting the command as it is

        I guess the main alternative would be to add more -flags... I'm okay breaking backwards compatibility there.

        Show
        Jonathan Ellis added a comment - Just wanted to make sure folks were ok with splitting the command as it is I guess the main alternative would be to add more -flags... I'm okay breaking backwards compatibility there.
        Hide
        Dave Brosius added a comment -

        the issue with -flags, is then you have the potential situation of n keyspaces with a cf name... which might be confusing... hopefully people don't have the same cf name in multiple keyspaces. -flags is also different then the way other commands handle cfs. But i'm fine with doing it that way as well. If that were the case there would be only one jmx call i would think.

        Show
        Dave Brosius added a comment - the issue with -flags, is then you have the potential situation of n keyspaces with a cf name... which might be confusing... hopefully people don't have the same cf name in multiple keyspaces. -flags is also different then the way other commands handle cfs. But i'm fine with doing it that way as well. If that were the case there would be only one jmx call i would think.
        Hide
        Jonathan Ellis added a comment -

        then you have the potential situation of n keyspaces with a cf name

        Not sure I follow, could you elaborate?

        Show
        Jonathan Ellis added a comment - then you have the potential situation of n keyspaces with a cf name Not sure I follow, could you elaborate?
        Hide
        Dave Brosius added a comment -

        if one did

        nodetool snapshot -cf foo

        that could potentially take snapshots of multiple 'foo's (one each in multiple keyspaces) which might be something the admin wasn't realizing... right? or am i wrong and cf names are unique across the cluster?

        Show
        Dave Brosius added a comment - if one did nodetool snapshot -cf foo that could potentially take snapshots of multiple 'foo's (one each in multiple keyspaces) which might be something the admin wasn't realizing... right? or am i wrong and cf names are unique across the cluster?
        Hide
        Jonathan Ellis added a comment -

        Ah, I see. Quite right, CF names are not unique. (So what you could do is check the schema nodetool-side and spit back a "which KS did you want to snapshot CF in?" error...)

        Show
        Jonathan Ellis added a comment - Ah, I see. Quite right, CF names are not unique. (So what you could do is check the schema nodetool-side and spit back a "which KS did you want to snapshot CF in?" error...)
        Hide
        Dave Brosius added a comment -

        rework to only use the snapshot command and honor an optional -cf tag for the column family. If the -cf tag is used, insist that one and only keyspace is specified.

        patch against trunk

        the jmx call will not be backwards compatible.

        Show
        Dave Brosius added a comment - rework to only use the snapshot command and honor an optional -cf tag for the column family. If the -cf tag is used, insist that one and only keyspace is specified. patch against trunk the jmx call will not be backwards compatible.
        Hide
        Dave Brosius added a comment -

        cf_snapshots_556_2A.diff is an alternative if you want two separate jmx methods (couldn't tell what you wanted).

        also against trunk.

        Show
        Dave Brosius added a comment - cf_snapshots_556_2A.diff is an alternative if you want two separate jmx methods (couldn't tell what you wanted). also against trunk.
        Hide
        Jonathan Ellis added a comment -

        2A lgtm. committed to 1.1.1 and trunk. thanks!

        Show
        Jonathan Ellis added a comment - 2A lgtm. committed to 1.1.1 and trunk. thanks!
        Hide
        Jonathan Ellis added a comment -

        Trivium: this was our oldest open issue.

        Show
        Jonathan Ellis added a comment - Trivium: this was our oldest open issue.

          People

          • Assignee:
            Dave Brosius
            Reporter:
            Chris Were
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development