Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11194

materialized views - support explode() on collections

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • None

    Description

      I'm working on a database design to model a product catalog. Products can belong to categories. Categories can belong to multiple sub categories (think about Amazon's complex taxonomies).

      My category table would look like this, giving me individual categories & their parents:

      CREATE TABLE category (
          category_id uuid primary key,
          name text,
          parents set<uuid>
      );
      

      To get a list of all the children of a particular category, I need a table that looks like the following:

      CREATE TABLE categories_by_parent (
          parent_id uuid,
          category_id uuid,
          name text,
          primary key (parent_id, category_id)
      );
      

      The important thing to note here is that a single category can have multiple parents.

      I'd like to propose support for collections in materialized views via an explode() function that would create 1 row per item in the collection. For instance, I'll insert the following 3 rows (2 parents, 1 child) into the category table:

      insert into category (category_id, name, parents) values (009fe0e1-5b09-4efc-a92d-c03720324a4f, 'Parent', null);
      
      insert into category (category_id, name, parents) values (1f2914de-0adf-4afc-b7ad-ddd8dc876ab1, 'Parent2', null);
      
      insert into category (category_id, name, parents) values (1f93bc07-9874-42a5-a7d1-b741dc9c509c, 'Child', {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 });
      
      cqlsh:test> select * from category;
      
       category_id                          | name    | parents
      --------------------------------------+---------+------------------------------------------------------------------------------
       009fe0e1-5b09-4efc-a92d-c03720324a4f |  Parent |                                                                         null
       1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 | Parent2 |                                                                         null
       1f93bc07-9874-42a5-a7d1-b741dc9c509c |   Child | {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1}
      
      (3 rows)
      
      

      Given the following CQL to select the child category, utilizing an explode function, I would expect to get back 2 rows, 1 for each parent:

      select category_id, name, explode(parents) as parent_id from category where category_id = 1f93bc07-9874-42a5-a7d1-b741dc9c509c;
      
      category_id                          | name  | parent_id
      --------------------------------------+-------+--------------------------------------
      1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 009fe0e1-5b09-4efc-a92d-c03720324a4f
      1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1
      
      (2 rows)
      

      This functionality would ideally apply to materialized views, since the ability to control partitioning here would allow us to efficiently query our MV for all categories belonging to a parent in a complex taxonomy.

      CREATE MATERIALIZED VIEW categories_by_parent as
      SELECT explode(parents) as parent_id,
              category_id, name FROM category WHERE parents IS NOT NULL
      

      The explode() function is available in Spark Dataframes and my proposed function has the same behavior: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.explode

      Attachments

        Activity

          People

            Unassigned Unassigned
            rustyrazorblade Jon Haddad
            Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: