Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1150

Create the a new DynamicRecordType, avoiding star expansion when working with this type

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      DynamicRecordType can be used to leverage user-provided field implications to avoid schema analysis until execution.

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        This feature will be useful, and as I have said to Jacques Nadeau would fit well within Calcite, but it’s a bit shapeless to me right now. I would like to see some validator tests (any maybe one or two sql-to-rel converter tests) so I get a feel for how it would work. When you have some tests can you post a link to the github branch? I don’t think you should charge ahead with the implementation because I might not agree with the specification (i.e. the test cases).

        Show
        julianhyde Julian Hyde added a comment - This feature will be useful, and as I have said to Jacques Nadeau would fit well within Calcite, but it’s a bit shapeless to me right now. I would like to see some validator tests (any maybe one or two sql-to-rel converter tests) so I get a feel for how it would work. When you have some tests can you post a link to the github branch? I don’t think you should charge ahead with the implementation because I might not agree with the specification (i.e. the test cases).
        Hide
        jni Jinfeng Ni added a comment -

        Agreed with Julian Hyde 's comment.

        I put a short doc describing the expected behavior (or specification), to support the notion of "schema-on-read", "dynamic star" column in Calcite. Julian Hyde, Jacques Nadeau, could you please take a look? Once we agree with the specification, we could go further with the implementation detail.

        Here is the link to google doc:

        https://docs.google.com/document/d/1vCWlqRyJQCtYbtVAjGOKP-8BD4_hrhoM9-4qbdoJs6k/edit?usp=sharing

        Show
        jni Jinfeng Ni added a comment - Agreed with Julian Hyde 's comment. I put a short doc describing the expected behavior (or specification), to support the notion of "schema-on-read", "dynamic star" column in Calcite. Julian Hyde , Jacques Nadeau , could you please take a look? Once we agree with the specification, we could go further with the implementation detail. Here is the link to google doc: https://docs.google.com/document/d/1vCWlqRyJQCtYbtVAjGOKP-8BD4_hrhoM9-4qbdoJs6k/edit?usp=sharing
        Hide
        julianhyde Julian Hyde added a comment -

        I am swamped for the next couple of days but I'll review as soon as I can.

        Show
        julianhyde Julian Hyde added a comment - I am swamped for the next couple of days but I'll review as soon as I can.
        Hide
        jni Jinfeng Ni added a comment -

        If it's easier to review new unit tests, here is the link to work-in-progress branch [1]. In the last commit, I added couple of unit testcases in SqlToRelConverterTest.

        The branch is not ready for complete review, as I'm still working on both the implementation and adding new unit tests.

        [1] https://github.com/jinfengni/incubator-optiq/commits/InProgress/CALCITE-1150

        Show
        jni Jinfeng Ni added a comment - If it's easier to review new unit tests, here is the link to work-in-progress branch [1] . In the last commit, I added couple of unit testcases in SqlToRelConverterTest. The branch is not ready for complete review, as I'm still working on both the implementation and adding new unit tests. [1] https://github.com/jinfengni/incubator-optiq/commits/InProgress/CALCITE-1150
        Hide
        julianhyde Julian Hyde added a comment -

        I know it's a work in progress, but it's looking good. There are a few things I'd do slightly differently:

        • Move SqlSelect.expanded into the validator, maybe in SelectScope. It is best if validation does not mutate the parse tree. (Yes, I know expansion changes the select list but don't want to make that hole of technical debt any deeper.)
        • RelDataTypes are immutable, and your DynamicRecordType is mutable. I know you need a "collector". I could accept using a mutable type just during validation, then switch to an immutable dynamic record type on sql-to-rel. But if the collector could be in a scope that would be even better.
        • Do you really need "" to be a field in the record type? Is it not sufficient for "" to be a field reference?
        • How do you plan to represent the row type of SELECT * FROM (SELECT * FROM t1), (SELECT * FROM t2))? There are unresolved stars on both sides.
        • Given RelDataType t, I'd prefer t.isDynamicStruct() to t instanceof DynamicRecordType.
        • Minor nit-pick: use new ArrayList<>() rather than Lists.newArrayList() for new code.
        • Add a sql-to-rel test SELECT * FROM nonDynamicTable WHERE EXISTS (SELECT * FROM dynamicTable). The select clause of an EXISTS sub-query should always be ignored.
        Show
        julianhyde Julian Hyde added a comment - I know it's a work in progress, but it's looking good. There are a few things I'd do slightly differently: Move SqlSelect.expanded into the validator, maybe in SelectScope. It is best if validation does not mutate the parse tree. (Yes, I know expansion changes the select list but don't want to make that hole of technical debt any deeper.) RelDataTypes are immutable, and your DynamicRecordType is mutable. I know you need a "collector". I could accept using a mutable type just during validation, then switch to an immutable dynamic record type on sql-to-rel. But if the collector could be in a scope that would be even better. Do you really need " " to be a field in the record type? Is it not sufficient for " " to be a field reference? How do you plan to represent the row type of SELECT * FROM (SELECT * FROM t1), (SELECT * FROM t2)) ? There are unresolved stars on both sides. Given RelDataType t , I'd prefer t.isDynamicStruct() to t instanceof DynamicRecordType . Minor nit-pick: use new ArrayList<>() rather than Lists.newArrayList() for new code. Add a sql-to-rel test SELECT * FROM nonDynamicTable WHERE EXISTS (SELECT * FROM dynamicTable) . The select clause of an EXISTS sub-query should always be ignored.
        Hide
        jni Jinfeng Ni added a comment -

        Julian Hyde, thank you for your review comments! I'll revise the patch based on your comments.

        Show
        jni Jinfeng Ni added a comment - Julian Hyde , thank you for your review comments! I'll revise the patch based on your comments.
        Hide
        jni Jinfeng Ni added a comment -

        Julian Hyde, I revised the patch based on your comments.

        • Move SqlSelect.expanded into the validator, maybe in SelectScope. It is best if validation does not mutate the parse tree. (Yes, I know expansion changes the select list but don't want to make that hole of technical debt any deeper.)
          • Removed the code of SqlSelect.expanded. Previously, this flag is needed to avoid expanding select list multiple times. The new code will represent the static star and dynamic star field differently. This makes that code unnecessary. However, SqlValidatorImpl still has to expand select list, order by exp, where/join condition, and group by exp. In some sense, the expand is part of validation process : we validate and replace the original form with a "validated" form. Validator's "originalExprs" is for the purpose of keeping track of the "original" vs "expanded" expression.
        • RelDataTypes are immutable, and your DynamicRecordType is mutable. I know you need a "collector". I could accept using a mutable type just during validation, then switch to an immutable dynamic record type on sql-to-rel. But if the collector could be in a scope that would be even better.
          • Per your suggestion, I use DynamicRecordType only in validation phase. In RelOptTable.toRel(), when the rowType is dynamicStruct, we switch to an immutable and regular record type.
        • Do you really need "" to be a field in the record type? Is it not sufficient for "" to be a field reference?
          • I'm not sure if I understand the question. But I choose to use "**" as the dynamic star prefix, to differentiate from static star "". Also, introduce a new SqlTypeName for dynamic star field. This new SqlTypeName is useful to identify the dynamic star fields in the rowType. When the physical plan is built, we rely on such knowledge to prepare the dynamic expansion in execution time.
        • How do you plan to represent the row type of SELECT * FROM (SELECT * FROM t1), (SELECT * FROM t2))? There are unresolved stars on both sides.
          • I added one unit test for such case. Basically, the top project will have two dynamic star fields. One is named as "*", the other one is named as "*0", due to naming conflict. The * column in outer query block is actually treated as a regular star column; it's expanded into two dynamic star fields in validation phase. It will also resolve correctly for SELECT * FROM (SELECT *, col1 FROM t1), (SELECT *, col2 FROM t2)).
        • Given RelDataType t, I'd prefer t.isDynamicStruct() to t instanceof DynamicRecordType.
          • Done. Add isDynamicStruct() to interface RelDataType.
        • Minor nit-pick: use new ArrayList<>() rather than Lists.newArrayList() for new code.
          • Done.
        • Add a sql-to-rel test SELECT * FROM nonDynamicTable WHERE EXISTS (SELECT * FROM dynamicTable). The select clause of an EXISTS sub-query should always be ignored.
          • Done. The select clause will be ignored, although the scan will return all the columns. That's the same behavior for the case where static table is used in the exist sub-query.

        I also modify couple of expected error messages in SqlValidatorTest. According to the code, the error message should use the original expression, in stead of expanded expression.

        Show
        jni Jinfeng Ni added a comment - Julian Hyde , I revised the patch based on your comments. Move SqlSelect.expanded into the validator, maybe in SelectScope. It is best if validation does not mutate the parse tree. (Yes, I know expansion changes the select list but don't want to make that hole of technical debt any deeper.) Removed the code of SqlSelect.expanded. Previously, this flag is needed to avoid expanding select list multiple times. The new code will represent the static star and dynamic star field differently. This makes that code unnecessary. However, SqlValidatorImpl still has to expand select list, order by exp, where/join condition, and group by exp. In some sense, the expand is part of validation process : we validate and replace the original form with a "validated" form. Validator's "originalExprs" is for the purpose of keeping track of the "original" vs "expanded" expression. RelDataTypes are immutable, and your DynamicRecordType is mutable. I know you need a "collector". I could accept using a mutable type just during validation, then switch to an immutable dynamic record type on sql-to-rel. But if the collector could be in a scope that would be even better. Per your suggestion, I use DynamicRecordType only in validation phase. In RelOptTable.toRel(), when the rowType is dynamicStruct, we switch to an immutable and regular record type. Do you really need "" to be a field in the record type? Is it not sufficient for "" to be a field reference? I'm not sure if I understand the question. But I choose to use "**" as the dynamic star prefix, to differentiate from static star "". Also, introduce a new SqlTypeName for dynamic star field. This new SqlTypeName is useful to identify the dynamic star fields in the rowType. When the physical plan is built, we rely on such knowledge to prepare the dynamic expansion in execution time. How do you plan to represent the row type of SELECT * FROM (SELECT * FROM t1), (SELECT * FROM t2))? There are unresolved stars on both sides. I added one unit test for such case. Basically, the top project will have two dynamic star fields. One is named as "* ", the other one is named as " *0", due to naming conflict. The * column in outer query block is actually treated as a regular star column; it's expanded into two dynamic star fields in validation phase. It will also resolve correctly for SELECT * FROM (SELECT *, col1 FROM t1), (SELECT *, col2 FROM t2)). Given RelDataType t, I'd prefer t.isDynamicStruct() to t instanceof DynamicRecordType. Done. Add isDynamicStruct() to interface RelDataType. Minor nit-pick: use new ArrayList<>() rather than Lists.newArrayList() for new code. Done. Add a sql-to-rel test SELECT * FROM nonDynamicTable WHERE EXISTS (SELECT * FROM dynamicTable). The select clause of an EXISTS sub-query should always be ignored. Done. The select clause will be ignored, although the scan will return all the columns. That's the same behavior for the case where static table is used in the exist sub-query. I also modify couple of expected error messages in SqlValidatorTest. According to the code, the error message should use the original expression, in stead of expanded expression.
        Hide
        jni Jinfeng Ni added a comment -
        Show
        jni Jinfeng Ni added a comment - Here is the PR link: https://github.com/apache/calcite/pull/228
        Hide
        julianhyde Julian Hyde added a comment -

        DynamicRecordTypeImpl's constructor calls computeDigest. But the columns may later change. It's worth making a note that the digest is not to be relied upon. In fact DynamicRecordTypeImpl.equals should almost certainly not call super.equals.

        Some cosmetic stuff:

        • In Javadoc, can you change method javadoc from "Return ..." to "Returns ...", "Create ..." to "Creates ..." etc. Also add <p> between paragraphs.
        • Please use // rather than /* comments inside methods.
        • Make RelOptTableImpl.toRel recurse once if the row type is dynamic. I think it would be clearer.
        • Remove RelDataTypeHolder.equals and .hashCode since they are the same as for Object

        I'm working on CALCITE-1208 and my changes will probably conflict with yours but I guess I'll deal with that.

        Show
        julianhyde Julian Hyde added a comment - DynamicRecordTypeImpl's constructor calls computeDigest. But the columns may later change. It's worth making a note that the digest is not to be relied upon. In fact DynamicRecordTypeImpl.equals should almost certainly not call super.equals. Some cosmetic stuff: In Javadoc, can you change method javadoc from "Return ..." to "Returns ...", "Create ..." to "Creates ..." etc. Also add <p> between paragraphs. Please use // rather than /* comments inside methods. Make RelOptTableImpl.toRel recurse once if the row type is dynamic. I think it would be clearer. Remove RelDataTypeHolder.equals and .hashCode since they are the same as for Object I'm working on CALCITE-1208 and my changes will probably conflict with yours but I guess I'll deal with that.
        Hide
        jni Jinfeng Ni added a comment -

        Thanks for your comments, Julian Hyde. I modified PR based on your comments.

        • Modified DynamicRecordTypeImpl such that digest is computed in constructor as well as when a new field is added. This will make sure digest is reflecting the latest field list. I also remove equals() and hashCode() for DynamicRecordTypeImpl, since it could use parent's corresponding code (which uses digest)
        • Modify Javadoc.
        • Changed the comments inside methods
        • Changed RelOptTableImpl.toRel to use recursive call in case the row type is dynamic.
        • Removed RelDataTypeHolder.equals and hashCode.

        The change is put in a separate commit to make it easier to review. We can squash if it looks good. Thanks!

        https://github.com/apache/calcite/pull/228

        Show
        jni Jinfeng Ni added a comment - Thanks for your comments, Julian Hyde . I modified PR based on your comments. Modified DynamicRecordTypeImpl such that digest is computed in constructor as well as when a new field is added. This will make sure digest is reflecting the latest field list. I also remove equals() and hashCode() for DynamicRecordTypeImpl, since it could use parent's corresponding code (which uses digest) Modify Javadoc. Changed the comments inside methods Changed RelOptTableImpl.toRel to use recursive call in case the row type is dynamic. Removed RelDataTypeHolder.equals and hashCode. The change is put in a separate commit to make it easier to review. We can squash if it looks good. Thanks! https://github.com/apache/calcite/pull/228
        Hide
        julianhyde Julian Hyde added a comment -

        Great. +1. Can you please squash into a single commit and push to master?

        Show
        julianhyde Julian Hyde added a comment - Great. +1. Can you please squash into a single commit and push to master?
        Show
        jni Jinfeng Ni added a comment - Fixed in https://git1-us-west.apache.org/repos/asf?p=calcite.git;a=commit;h=9bd7d7550cfe31b76e3970a532014a8186e1ff5a
        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.8.0 (2016-06-13).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.8.0 (2016-06-13).

          People

          • Assignee:
            jni Jinfeng Ni
            Reporter:
            jnadeau Jacques Nadeau
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development