Cassandra
  1. Cassandra
  2. CASSANDRA-2477

CQL support for describing keyspaces / column familes

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.1.0
    • Component/s: API, Core
    • Labels:

      Issue Links

        Activity

        Eric Evans created issue -
        Eric Evans made changes -
        Field Original Value New Value
        Fix Version/s 1.0 [ 12316349 ]
        Eric Evans made changes -
        Component/s API [ 12313742 ]
        Component/s Core [ 12312978 ]
        Hide
        Vivek Mishra added a comment -

        This is to do with CQL jdbc support?

        Show
        Vivek Mishra added a comment - This is to do with CQL jdbc support?
        Hide
        Jonathan Ellis added a comment -

        Sort of. It's really about making CQL feature-complete and not needing the old API anymore.

        Show
        Jonathan Ellis added a comment - Sort of. It's really about making CQL feature-complete and not needing the old API anymore.
        Hide
        Jonathan Ellis added a comment -

        Rather than adding new CQL keywords, one alternative would be to take the pgsql approach of building this logic into the shell, by querying system CFs.

        Show
        Jonathan Ellis added a comment - Rather than adding new CQL keywords, one alternative would be to take the pgsql approach of building this logic into the shell, by querying system CFs.
        Jonathan Ellis made changes -
        Labels cql
        Assignee Jonathan Ellis [ jbellis ]
        Fix Version/s 0.8.1 [ 12316368 ]
        Fix Version/s 1.0 [ 12316349 ]
        Hide
        Rick Shaw added a comment -

        Tooling will want to be able to ask the API to describe:

        • List all the Keyspace names.
        • List all the attributes of the Keyspace
        • List all the column families within a Keyspace
        • List all of the column names and types of those columns that have been declared
        • list all of the indexed names and alias names

        I think this will need to be returned by the API at the code level not a shell.

        How about just a special set of CFs in the "System" Keyspace that can be treated as RO data that describes Keyspaces and CFs?

        Show
        Rick Shaw added a comment - Tooling will want to be able to ask the API to describe: List all the Keyspace names. List all the attributes of the Keyspace List all the column families within a Keyspace List all of the column names and types of those columns that have been declared list all of the indexed names and alias names I think this will need to be returned by the API at the code level not a shell. How about just a special set of CFs in the "System" Keyspace that can be treated as RO data that describes Keyspaces and CFs?
        Hide
        Jonathan Ellis added a comment -

        I think this will need to be returned by the API at the code level not a shell.

        You misunderstood; I'm not saying that the shell will be the only client able to access that, but that the shell can have shortcuts ("\d users" vs "select * from pg_class where relname = 'users'").

        Show
        Jonathan Ellis added a comment - I think this will need to be returned by the API at the code level not a shell. You misunderstood; I'm not saying that the shell will be the only client able to access that, but that the shell can have shortcuts ("\d users" vs "select * from pg_class where relname = 'users'").
        Hide
        Rick Shaw added a comment -

        Sorry... No that is a good idea...

        Does CLI go away in deference to this new shell level access to CQL? Like the features in CLI but using CQL syntax would be great!

        Show
        Rick Shaw added a comment - Sorry... No that is a good idea... Does CLI go away in deference to this new shell level access to CQL? Like the features in CLI but using CQL syntax would be great!
        Hide
        Jonathan Ellis added a comment -

        Yes, we want cqlsh or a similar tool to be a full replacement for the existing cli.

        Show
        Jonathan Ellis added a comment - Yes, we want cqlsh or a similar tool to be a full replacement for the existing cli.
        Hide
        Eric Evans added a comment -

        Rather than adding new CQL keywords, one alternative would be to take the pgsql approach of building this logic into the shell, by querying system CFs.

        How will you actually grok the schema without Avro?

        Show
        Eric Evans added a comment - Rather than adding new CQL keywords, one alternative would be to take the pgsql approach of building this logic into the shell, by querying system CFs. How will you actually grok the schema without Avro?
        Hide
        Jonathan Ellis added a comment -

        How will you actually grok the schema without Avro?

        That's the question, isn't it?

        What I want to avoid is a special query type (i.e. anything not "SELECT") because it makes the language less orthogonal because of implementation details that are subject to change (in an ideal world, we'd move away from avro and store schema information in "real" columns, with indexes so you could easily say "give me all the columns for CF X at schema version Y.")

        You'd also be limited to basically an RPC style call – no specifying which columns to select, or which rows you're interested in. Not without reinventing that wheel on a LOT of code (because SELECT right now relies on CFS to perform the actual queries).

        My first stab was to create a class of "virtual" CFs that would contain read-only data for this kind of query. (Attached.) I stopped when I got to a chicken/egg problem: the natural place to update this view CF is DD.setTableDefinition, but that gets called on startup when Table/CFS objects aren't live yet.

        Trying to fake this at the QueryProcessor layer is also unsatisfactory – we come back to the same kind of limitations we'd have with a special-case query type, that we don't have full SELECT support on non-CFS objects without insane amounts of work.

        Starting to think that we'll have to bite the bullet and get rid of avro schema serialization to do this right. Which I do think is doable, but it's more than I want to bite off right now. (Until then, clients like JDBC DatabaseMetadata can continue to use the thrift describe_ calls.)

        Show
        Jonathan Ellis added a comment - How will you actually grok the schema without Avro? That's the question, isn't it? What I want to avoid is a special query type (i.e. anything not "SELECT") because it makes the language less orthogonal because of implementation details that are subject to change (in an ideal world, we'd move away from avro and store schema information in "real" columns, with indexes so you could easily say "give me all the columns for CF X at schema version Y.") You'd also be limited to basically an RPC style call – no specifying which columns to select, or which rows you're interested in. Not without reinventing that wheel on a LOT of code (because SELECT right now relies on CFS to perform the actual queries). My first stab was to create a class of "virtual" CFs that would contain read-only data for this kind of query. (Attached.) I stopped when I got to a chicken/egg problem: the natural place to update this view CF is DD.setTableDefinition, but that gets called on startup when Table/CFS objects aren't live yet. Trying to fake this at the QueryProcessor layer is also unsatisfactory – we come back to the same kind of limitations we'd have with a special-case query type, that we don't have full SELECT support on non-CFS objects without insane amounts of work. Starting to think that we'll have to bite the bullet and get rid of avro schema serialization to do this right. Which I do think is doable, but it's more than I want to bite off right now. (Until then, clients like JDBC DatabaseMetadata can continue to use the thrift describe_ calls.)
        Jonathan Ellis made changes -
        Attachment 2477-virtual-cfs-false-start.txt [ 12479670 ]
        Jonathan Ellis made changes -
        Assignee Jonathan Ellis [ jbellis ]
        Hide
        Vivek Mishra added a comment -

        Hi,
        Rather than introducing a list of new commands.
        Will it be a good option to introduce something similar sysobjects in Oracle and information_schema in mysql.

        Referring RDBMS only for such concepts!

        Show
        Vivek Mishra added a comment - Hi, Rather than introducing a list of new commands. Will it be a good option to introduce something similar sysobjects in Oracle and information_schema in mysql. Referring RDBMS only for such concepts!
        Hide
        Jonathan Ellis added a comment -

        I addressed exactly this in the comment immediately before yours.

        Show
        Jonathan Ellis added a comment - I addressed exactly this in the comment immediately before yours.
        Hide
        Vivek Mishra added a comment -

        Hi,
        Just took a reference from oracle data dictionary:
        dba_all_tables

        • dba_indexes
        • dba_ind_partitions
        • dba_ind_subpartitions
        • dba_object_tables
        • dba_part_col_statistics
        • dba_subpart_col_statistics
        • dba_tables
        • dba_tab_cols
        • dba_tab_columns
        • dba_tab_col_statistics
        • dba_tab_partitions
        • dba_tab_subpartitions

        out of these, which looks to be required for Cassandra?

        dba_all_tables, dba_object_tables, dba_tab_columns ?

        Show
        Vivek Mishra added a comment - Hi, Just took a reference from oracle data dictionary: dba_all_tables dba_indexes dba_ind_partitions dba_ind_subpartitions dba_object_tables dba_part_col_statistics dba_subpart_col_statistics dba_tables dba_tab_cols dba_tab_columns dba_tab_col_statistics dba_tab_partitions dba_tab_subpartitions out of these, which looks to be required for Cassandra? dba_all_tables, dba_object_tables, dba_tab_columns ?
        Hide
        Jonathan Ellis added a comment -

        Probably.

        IMO the right way to think about this is "how can we expose the data we have in KSMetaData and CFMetaData" not "how can we make this look like Oracle."

        Show
        Jonathan Ellis added a comment - Probably. IMO the right way to think about this is "how can we expose the data we have in KSMetaData and CFMetaData" not "how can we make this look like Oracle."
        Hide
        Vivek Mishra added a comment -

        Absolutly.. make more sense.

        Show
        Vivek Mishra added a comment - Absolutly.. make more sense.
        Hide
        Vivek Mishra added a comment -

        From KsMetadata, something which i can see to get CFMetadata, strategy_class and strategy options... Count on number of ColumnFamilies, fetch specific CfMetadata(via cfName)

        Similarly from CFMetadata, we could fetch

        CfName, Default validator, compare type, subcolumn comparator, Keyspace name and many more..

        Is this what we are planning for "Describe Ks" or "Describe cf"?

        Show
        Vivek Mishra added a comment - From KsMetadata, something which i can see to get CFMetadata, strategy_class and strategy options... Count on number of ColumnFamilies, fetch specific CfMetadata(via cfName) Similarly from CFMetadata, we could fetch CfName, Default validator, compare type, subcolumn comparator, Keyspace name and many more.. Is this what we are planning for "Describe Ks" or "Describe cf"?
        Hide
        Rick Shaw added a comment -

        How about a custom CF for each (KS/CF); that fully defines the schema entries for each piece of information to be returned about each KS and each CF.

        Then DESCRIBE returns a CQLResult containing normal row content. The syntax of the DESCRIBE drives a flat set of rows returned to the caller. The CQL QueryProcessor returns the contents of the CQLRow items for information it already has access to in the internal metadata tables.

        Show
        Rick Shaw added a comment - How about a custom CF for each (KS/CF); that fully defines the schema entries for each piece of information to be returned about each KS and each CF. Then DESCRIBE returns a CQLResult containing normal row content. The syntax of the DESCRIBE drives a flat set of rows returned to the caller. The CQL QueryProcessor returns the contents of the CQLRow items for information it already has access to in the internal metadata tables.
        Jonathan Ellis made changes -
        Fix Version/s 0.8.2 [ 12316645 ]
        Fix Version/s 0.8.1 [ 12316368 ]
        Hide
        Jonathan Ellis added a comment -

        What I want to avoid is a special query type (i.e. anything not "SELECT") because it makes the language less orthogonal because of implementation details that are subject to change (in an ideal world, we'd move away from avro and store schema information in "real" columns, with indexes so you could easily say "give me all the columns for CF X at schema version Y.")

        You'd also be limited to basically an RPC style call – no specifying which columns to select, or which rows you're interested in. Not without reinventing that wheel on a LOT of code (because SELECT right now relies on CFS to perform the actual queries).

        Show
        Jonathan Ellis added a comment - What I want to avoid is a special query type (i.e. anything not "SELECT") because it makes the language less orthogonal because of implementation details that are subject to change (in an ideal world, we'd move away from avro and store schema information in "real" columns, with indexes so you could easily say "give me all the columns for CF X at schema version Y.") You'd also be limited to basically an RPC style call – no specifying which columns to select, or which rows you're interested in. Not without reinventing that wheel on a LOT of code (because SELECT right now relies on CFS to perform the actual queries).
        Hide
        Rick Shaw added a comment -

        So from the client side you will need to know the name of the Keyspace and the names of the CFs where this data will be stored and updated by the server. And the server side will need to fully document the current schema description of the CF(s) to do the SELECT on. And keep the CFs updated with any additions and updates to the internal KS and CF metadata. But with that info in hand the client could just issue a SELECT of the involved CF to get the metadata that is currently held in the internal server metadata structures represented by the associated KS/CFs. Is that the plan?

        Show
        Rick Shaw added a comment - So from the client side you will need to know the name of the Keyspace and the names of the CFs where this data will be stored and updated by the server. And the server side will need to fully document the current schema description of the CF(s) to do the SELECT on. And keep the CFs updated with any additions and updates to the internal KS and CF metadata. But with that info in hand the client could just issue a SELECT of the involved CF to get the metadata that is currently held in the internal server metadata structures represented by the associated KS/CFs. Is that the plan?
        Sylvain Lebresne made changes -
        Fix Version/s 0.8.3 [ 12317246 ]
        Fix Version/s 0.8.2 [ 12316645 ]
        Sylvain Lebresne made changes -
        Fix Version/s 0.8.4 [ 12317551 ]
        Fix Version/s 0.8.3 [ 12317246 ]
        Sylvain Lebresne made changes -
        Fix Version/s 0.8.5 [ 12317588 ]
        Fix Version/s 0.8.4 [ 12317551 ]
        Jonathan Ellis made changes -
        Parent CASSANDRA-2472 [ 12504283 ]
        Issue Type Sub-task [ 7 ] New Feature [ 2 ]
        Jonathan Ellis made changes -
        Fix Version/s 0.8.5 [ 12317588 ]
        Priority Major [ 3 ] Minor [ 4 ]
        Hide
        Patricio Echague added a comment -

        would describe_ring call fall into this ticket too ?

        Show
        Patricio Echague added a comment - would describe_ring call fall into this ticket too ?
        Hide
        Jonathan Ellis added a comment -

        I suppose.

        Show
        Jonathan Ellis added a comment - I suppose.
        Hide
        Paul Querna added a comment -

        +1,

        Would be very helpful to expose describe_

        {ring,partitioner,snitch,keyspace,version,cluster_name,schema_versions}

        to CQL.

        Show
        Paul Querna added a comment - +1, Would be very helpful to expose describe_ {ring,partitioner,snitch,keyspace,version,cluster_name,schema_versions} to CQL.
        Jonathan Ellis made changes -
        Fix Version/s 1.2 [ 12319262 ]
        Hide
        Jonathan Ellis added a comment -

        CASSANDRA-1391 should give us this for free. Leaving this open while we see if that works as planned.

        Show
        Jonathan Ellis added a comment - CASSANDRA-1391 should give us this for free. Leaving this open while we see if that works as planned.
        Jonathan Ellis made changes -
        Link This issue is related to CASSANDRA-1391 [ CASSANDRA-1391 ]
        Hide
        Jonathan Ellis added a comment -

        You can SELECT from schema_keyspaces, schema_columnfamilies, and schema_columns now.

        Show
        Jonathan Ellis added a comment - You can SELECT from schema_keyspaces, schema_columnfamilies, and schema_columns now.
        Jonathan Ellis made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 1.1.0 [ 12317615 ]
        Fix Version/s 1.2 [ 12319262 ]
        Resolution Fixed [ 1 ]
        Hide
        Matthew Higgins added a comment -

        I still don't understand how to get the column families from the system tables. In other words, I need to use cqlsh and DESCRIBE COLUMNFAMILIES to list an array of strings containing each column family for a keyspace.

        Show
        Matthew Higgins added a comment - I still don't understand how to get the column families from the system tables. In other words, I need to use cqlsh and DESCRIBE COLUMNFAMILIES to list an array of strings containing each column family for a keyspace.
        Hide
        Jonathan Ellis added a comment -

        cqlsh does the work for you. but if you wanted to do it programatically then you would need to piece the data together via SELECT.

        Show
        Jonathan Ellis added a comment - cqlsh does the work for you. but if you wanted to do it programatically then you would need to piece the data together via SELECT.
        Hide
        Matthew Higgins added a comment -

        How would I get all tables for a keyspace using CQL? cqlsh implements this with thrift api calls, so I have no examples. Selecting from schema_columnfamilies only returns a list of keyspaces.

        I am working on a Ruby gem that is 100% CQL except for a few calls (ie. https://github.com/data-axle/cassandra_object/blob/master/lib/cassandra_object/schema/tasks.rb). I would like to remove the cqlsh dependency.

        Show
        Matthew Higgins added a comment - How would I get all tables for a keyspace using CQL? cqlsh implements this with thrift api calls, so I have no examples. Selecting from schema_columnfamilies only returns a list of keyspaces. I am working on a Ruby gem that is 100% CQL except for a few calls (ie. https://github.com/data-axle/cassandra_object/blob/master/lib/cassandra_object/schema/tasks.rb ). I would like to remove the cqlsh dependency.
        Hide
        Jonathan Ellis added a comment -

        SELECT columnfamily_name FROM schema_columnfamilies WHERE keyspace_name='?'

        Suggest looking at definitions in CFMetaData.java.

        Show
        Jonathan Ellis added a comment - SELECT columnfamily_name FROM schema_columnfamilies WHERE keyspace_name='?' Suggest looking at definitions in CFMetaData.java.
        Hide
        Jonathan Ellis added a comment -

        Also note that some column names changed from 1.1 -> 1.2.

        Show
        Jonathan Ellis added a comment - Also note that some column names changed from 1.1 -> 1.2.
        Hide
        Matthew Higgins added a comment -

        Is columnfamily_name new in 1.2? For v1.1.5, the only column with values in the schema_columnfamilies table is "keyspace".

        Show
        Matthew Higgins added a comment - Is columnfamily_name new in 1.2? For v1.1.5, the only column with values in the schema_columnfamilies table is "keyspace".
        Hide
        Jonathan Ellis added a comment -

        Personally I think trying to support 1.1.x here is not going to end well.

        Show
        Jonathan Ellis added a comment - Personally I think trying to support 1.1.x here is not going to end well.
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12610558 ] patch-available, re-open possible [ 12753638 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12753638 ] reopen-resolved, no closed status, patch-avail, testing [ 12757549 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Eric Evans
          • Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development