Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:

      Description

      Our current view implementation only supports views over a single table. We should enhance this to support creating a view over multiple tables. For example: CREATE VIEW v AS SELECT * FROM DEPT d, EMPL e WHERE d.dept_id = e.dept_id

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        In the Calcite branch, defining views and expanding them in queries is very straightforward. I suggest that we allow arbitrary SELECT statements in views (JOIN, UNION, GROUP BY, and even ORDER BY), so this could cover PHOENIX-1506 and PHOENIX-1507 also.

        The only wrinkle is DML. Complex views won't support DML, because there may not be a well-defined row underlying each output row. DML-capable views have an underlying table (or view, which must be DML-capable).

        Calcite can figure out default expressions for all columns that are filtered & projected away by a DML-capable view. (This is by definition. If Calcite can't figure it out, the view is not considered DML-capable.)

        Also, DML-incapable views do not allow schema extensions (the EXTEND clause).

        In a project view that is DML-capable, some columns might not have base columns. For example, CREATE VIEW v AS SELECT empno, deptno, 10 as ten FROM Emp. You would not be able to specify ten in a DML statement.

        I have see several JIRA cases talking about indexes on views. In the Calcite branch, do these provide any advantages over indexes on tables? If I'd defined an index, I would hope that my query would use it, and my DML statement would cause it to be modified, regardless of whether my statement references the table or the view.

        Show
        julianhyde Julian Hyde added a comment - In the Calcite branch, defining views and expanding them in queries is very straightforward. I suggest that we allow arbitrary SELECT statements in views (JOIN, UNION, GROUP BY, and even ORDER BY), so this could cover PHOENIX-1506 and PHOENIX-1507 also. The only wrinkle is DML. Complex views won't support DML, because there may not be a well-defined row underlying each output row. DML-capable views have an underlying table (or view, which must be DML-capable). Calcite can figure out default expressions for all columns that are filtered & projected away by a DML-capable view. (This is by definition. If Calcite can't figure it out, the view is not considered DML-capable.) Also, DML-incapable views do not allow schema extensions (the EXTEND clause). In a project view that is DML-capable, some columns might not have base columns. For example, CREATE VIEW v AS SELECT empno, deptno, 10 as ten FROM Emp . You would not be able to specify ten in a DML statement. I have see several JIRA cases talking about indexes on views. In the Calcite branch, do these provide any advantages over indexes on tables? If I'd defined an index, I would hope that my query would use it, and my DML statement would cause it to be modified, regardless of whether my statement references the table or the view.
        Hide
        jamestaylor James Taylor added a comment -

        Indexes on views are for updatable views that allow columns to be defined. It wouldn't make sense to add them on read-only views - not sure if that's enforced or not, but hopefully it is.

        The complication in supporting other more complex queries in is tracking the multiple parents such that a CREATE VIEW v AS SELECT * ... projects all the right columns as the underlying tables metadata change. We have this ability in the way we represent this in our underlying Phoenix table, but the code likely makes some assumptions that a view has only a single parent. It's all very doable - we just need someone to volunteer to do the work.

        Show
        jamestaylor James Taylor added a comment - Indexes on views are for updatable views that allow columns to be defined. It wouldn't make sense to add them on read-only views - not sure if that's enforced or not, but hopefully it is. The complication in supporting other more complex queries in is tracking the multiple parents such that a CREATE VIEW v AS SELECT * ... projects all the right columns as the underlying tables metadata change. We have this ability in the way we represent this in our underlying Phoenix table, but the code likely makes some assumptions that a view has only a single parent. It's all very doable - we just need someone to volunteer to do the work.
        Hide
        julianhyde Julian Hyde added a comment - - edited

        The concept of a "view parents" can be simplified when we move to Calcite. A view either has a base table (in which case, it is DML-capable) or it does not. The parent is never a view.

        Indexes on views should be at most syntactic sugar for functional indexes. For example,

        create table t (a, b, c, d);
        create table v as select a, b + c as e from t where d = 5;
        create index iv on v (a, e);
        create index it on t (a, b + c);
        

        I would expect v and iv to have identical contents, and either could be used to answer the query

        select a from t where b + c > 10
        Show
        julianhyde Julian Hyde added a comment - - edited The concept of a "view parents" can be simplified when we move to Calcite. A view either has a base table (in which case, it is DML-capable) or it does not. The parent is never a view. Indexes on views should be at most syntactic sugar for functional indexes. For example, create table t (a, b, c, d); create table v as select a, b + c as e from t where d = 5; create index iv on v (a, e); create index it on t (a, b + c); I would expect v and iv to have identical contents, and either could be used to answer the query select a from t where b + c > 10
        Hide
        jamestaylor James Taylor added a comment -

        I'm sure that Calcite can handle lots of interesting view definitions at compile time, but Phoenix still needs to manage the metadata. That doesn't change when we move to Calcite. Either with or without Calcite, Phoenix still needs to manage the metadata and the hierarchy of tables/views, which columns they define, etc. For example, if you have a table and view like this:

        create table t (a, b, c, d);
        create view v1(e varchar, f varchar) as select * from t where d = 5;
        

        Then you drop column b from t, then we have to track that in our system catalog (and that the projection of the view doesn't include that column either).

        Just out of curiosity, can't a view be derived from another view? Isn't this ok?

        create view v2 as select * from v1 where e = 'foo';
        
        Show
        jamestaylor James Taylor added a comment - I'm sure that Calcite can handle lots of interesting view definitions at compile time, but Phoenix still needs to manage the metadata. That doesn't change when we move to Calcite. Either with or without Calcite, Phoenix still needs to manage the metadata and the hierarchy of tables/views, which columns they define, etc. For example, if you have a table and view like this: create table t (a, b, c, d); create view v1(e varchar, f varchar) as select * from t where d = 5; Then you drop column b from t, then we have to track that in our system catalog (and that the projection of the view doesn't include that column either). Just out of curiosity, can't a view be derived from another view? Isn't this ok? create view v2 as select * from v1 where e = 'foo';
        Hide
        julianhyde Julian Hyde added a comment -

        I agree you have to track dependencies. A table or view can have 0 or more dependencies that reference it and need to be re-compiled (or re-validated) if that table or view is changed. If a view can reference multiple tables, then the dependency graph becomes a DAG, not a hierarchy.

        Having a dependencies table (dependency_object_id, dependent_object_id) is pretty typical in DBMSs I've seen.

        Yes, a view can be derived from another view. I would make v2 depend on v1 so that if v1 changes (e.g. if its underlying table changes) then transitively v2 will need to change.

        Regarding the "select *" case. In LucidDB we chose a behavior that if you defined "create view v as select * from t", and added a column to t, the view would not acquire the column until you re-created the view. In other words, the * is expanded at creation time. There are fewer surprises that way.

        I see SPARK-18209 coming to similar conclusions.

        Show
        julianhyde Julian Hyde added a comment - I agree you have to track dependencies. A table or view can have 0 or more dependencies that reference it and need to be re-compiled (or re-validated) if that table or view is changed. If a view can reference multiple tables, then the dependency graph becomes a DAG, not a hierarchy. Having a dependencies table (dependency_object_id, dependent_object_id) is pretty typical in DBMSs I've seen. Yes, a view can be derived from another view. I would make v2 depend on v1 so that if v1 changes (e.g. if its underlying table changes) then transitively v2 will need to change. Regarding the "select *" case. In LucidDB we chose a behavior that if you defined "create view v as select * from t", and added a column to t, the view would not acquire the column until you re-created the view. In other words, the * is expanded at creation time. There are fewer surprises that way. I see SPARK-18209 coming to similar conclusions.

          People

          • Assignee:
            Unassigned
            Reporter:
            jamestaylor James Taylor
          • Votes:
            6 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:

              Development