[CASSANDRA-13857] Allow MV with only partition key - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: In Progress
Priority: Normal
Resolution: Unresolved
Fix Version/s: 5.x
Component/s: Feature/Materialized Views
Labels:
None

Severity:
Normal

Description

We currently disallow creation of a view that has the exact same primary key as the base where no clustering keys are present, however a potential use case would be a view where part of the PK is filtered so as to have a subset of data in the view which is faster for range queries. We actually currently allow this, but only if you have a clustering key defined. If you only have a partitioning key it's not possible.

From the mailing list, the below example works:

CREATE TABLE users (
  site_id int,
  user_id text,
  n int,
  data set<frozen<text>>,
  PRIMARY KEY ((site_id, user_id), n));

user data is updated and read by PK and sometimes I have to fetch all user for some specific site_id. It appeared that full scan by token(site_id,user_id) filtered by WHERE site_id = <some id> works much slower than unfiltered full scan on
CREATE MATERIALIZED VIEW users_1 AS
SELECT site_id, user_id, n, data
FROM users
WHERE site_id = 1 AND user_id IS NOT NULL AND n IS NOT NULL
PRIMARY KEY ((site_id, user_id), n);

However the following does not:

CREATE TABLE users (
site_id int,
user_id text,
data set<text>,
PRIMARY KEY ((site_id, user_id)));

CREATE MATERIALIZED VIEW users_1 AS
SELECT site_id, user_id, data
FROM users
WHERE site_id = 1 AND user_id IS NOT NULL 
PRIMARY KEY ((site_id, user_id));
InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are defined for Materialized View other than primary key"

This is because if the clustering key is empty we assume they've only defined the primary key in the partition key and we haven't accounted for this use case.

On that note, we also don't allow the following narrowing of the partition key:

CREATE TABLE kurt.base (
    id int,
    uid text,
    data text,
    PRIMARY KEY (id, uid)
) 

CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and uid='1' PRIMARY KEY ((id, uid));
InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are defined for Materialized View other than primary key"

But we do allow the following, which works because there is still a clustering key, despite not changing the PK.

CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and uid='1' PRIMARY KEY (id, uid);

And we also allow the following, which is a narrowing of the partition key as above, but with an extra clustering key.

create table kurt.base3 (id int, uid int, clus1 int, clus2 int, data text, PRIMARY KEY ((id, uid), clus1, clus2));

CREATE MATERIALIZED VIEW kurt.mv4 AS SELECT * from kurt.base3 where id IS NOT NULL and uid IS NOT NULL and clus1 IS NOT NULL AND clus2 IS NOT NULL  PRIMARY KEY ((id, uid, clus1), clus2);

I think supporting these cases is trivial and mostly already handled in the underlying MV write path, so we might be able to get away with just a simple change of this condition.

Attachments

Activity

People

Assignee:: Stefan Miklosovic

Reporter:: Kurt Greaves

Authors:: Stefan Miklosovic

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 08/Sep/17 03:31

Updated:: 07/Mar/23 10:53