Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-13857

Allow MV with only partition key

    XMLWordPrintableJSON

Details

    • Normal

    Description

      We currently disallow creation of a view that has the exact same primary key as the base where no clustering keys are present, however a potential use case would be a view where part of the PK is filtered so as to have a subset of data in the view which is faster for range queries. We actually currently allow this, but only if you have a clustering key defined. If you only have a partitioning key it's not possible.

      From the mailing list, the below example works:

      CREATE TABLE users (
        site_id int,
        user_id text,
        n int,
        data set<frozen<text>>,
        PRIMARY KEY ((site_id, user_id), n));
      
      user data is updated and read by PK and sometimes I have to fetch all user for some specific site_id. It appeared that full scan by token(site_id,user_id) filtered by WHERE site_id = <some id> works much slower than unfiltered full scan on
      CREATE MATERIALIZED VIEW users_1 AS
      SELECT site_id, user_id, n, data
      FROM users
      WHERE site_id = 1 AND user_id IS NOT NULL AND n IS NOT NULL
      PRIMARY KEY ((site_id, user_id), n);
      

      However the following does not:

      CREATE TABLE users (
      site_id int,
      user_id text,
      data set<text>,
      PRIMARY KEY ((site_id, user_id)));
      
      CREATE MATERIALIZED VIEW users_1 AS
      SELECT site_id, user_id, data
      FROM users
      WHERE site_id = 1 AND user_id IS NOT NULL 
      PRIMARY KEY ((site_id, user_id));
      InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are defined for Materialized View other than primary key"
      

      This is because if the clustering key is empty we assume they've only defined the primary key in the partition key and we haven't accounted for this use case.

      On that note, we also don't allow the following narrowing of the partition key:

      CREATE TABLE kurt.base (
          id int,
          uid text,
          data text,
          PRIMARY KEY (id, uid)
      ) 
      
      CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and uid='1' PRIMARY KEY ((id, uid));
      InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are defined for Materialized View other than primary key"
      

      But we do allow the following, which works because there is still a clustering key, despite not changing the PK.

      CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and uid='1' PRIMARY KEY (id, uid);
      

      And we also allow the following, which is a narrowing of the partition key as above, but with an extra clustering key.

      create table kurt.base3 (id int, uid int, clus1 int, clus2 int, data text, PRIMARY KEY ((id, uid), clus1, clus2));
      
      CREATE MATERIALIZED VIEW kurt.mv4 AS SELECT * from kurt.base3 where id IS NOT NULL and uid IS NOT NULL and clus1 IS NOT NULL AND clus2 IS NOT NULL  PRIMARY KEY ((id, uid, clus1), clus2);
      

      I think supporting these cases is trivial and mostly already handled in the underlying MV write path, so we might be able to get away with just a simple change of this condition.

      Attachments

        Activity

          People

            stefan.miklosovic Stefan Miklosovic
            KurtG Kurt Greaves
            Stefan Miklosovic
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: