Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1940

Implement dialect specific support for sequences

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The Calcite parser and validator already supports sequences but the push down to the JDBC level is currently limited. SInce sequences are not supported on all DBMS every sql dialect should have the possibility to render it's own way of extracting sequence information or incrementing the value.

        Issue Links

          Activity

          Hide
          christian.beikov Christian Beikov added a comment -

          See the following PR for a fix: https://github.com/apache/calcite/pull/514

          Show
          christian.beikov Christian Beikov added a comment - See the following PR for a fix: https://github.com/apache/calcite/pull/514
          Hide
          julianhyde Julian Hyde added a comment -

          Good idea. Reviewing https://github.com/apache/calcite/pull/514/commits/99e6ca890431d5f55be23924844f6d3f0058f364:

          • Why do you need JdbcTable.jdbcTableName?
          • The code in CalcitePrepareImpl that grabs a SqlDialect (in fact the idea of a unique SqlDialect for the whole query) is dubious. What the query is a union between a MySQL database and a PostgreSQL database?
          • I like the idea of interface Sequence, and also JdbcSequence extends JdbcTable implements Sequence is just right
          • We use camel case in class names, even those containing acronyms. Please rename DB2SequenceSupport to Db2SequenceSupport, MSSQLSequenceSupportResolver to MssqlSequenceSupportResolver
          • Don't use FakeUtil.newInternal; just use new RuntimeException

          Should SequenceSupport be rolled into Handler? SequenceSupport.unparseSequenceVal definitely fits into Handler.unparseCall. SequenceSupport.extract could go to either Handler or SqlDialect. Chris Baynes, what do you think?

          Show
          julianhyde Julian Hyde added a comment - Good idea. Reviewing https://github.com/apache/calcite/pull/514/commits/99e6ca890431d5f55be23924844f6d3f0058f364: Why do you need JdbcTable.jdbcTableName ? The code in CalcitePrepareImpl that grabs a SqlDialect (in fact the idea of a unique SqlDialect for the whole query) is dubious. What the query is a union between a MySQL database and a PostgreSQL database? I like the idea of interface Sequence , and also JdbcSequence extends JdbcTable implements Sequence is just right We use camel case in class names, even those containing acronyms. Please rename DB2SequenceSupport to Db2SequenceSupport , MSSQLSequenceSupportResolver to MssqlSequenceSupportResolver Don't use FakeUtil.newInternal ; just use new RuntimeException Should SequenceSupport be rolled into Handler ? SequenceSupport.unparseSequenceVal definitely fits into Handler.unparseCall . SequenceSupport.extract could go to either Handler or SqlDialect . Chris Baynes , what do you think?
          Hide
          christian.beikov Christian Beikov added a comment -

          Seems I only use JdbcTable.jdbcTableName to get a proper toString representation for the sequence.
          I wasn't sure how to get the dialect during unparsing and since the writer already had a getter, I thought it might be appropriate to just get the proper one in. I suppose the writer should be extended to be able to return a dialect for a schema name?
          I'll adapt the PR to the requested changes and let you know when it's done.

          Show
          christian.beikov Christian Beikov added a comment - Seems I only use JdbcTable.jdbcTableName to get a proper toString representation for the sequence. I wasn't sure how to get the dialect during unparsing and since the writer already had a getter, I thought it might be appropriate to just get the proper one in. I suppose the writer should be extended to be able to return a dialect for a schema name? I'll adapt the PR to the requested changes and let you know when it's done.
          Hide
          julianhyde Julian Hyde added a comment -

          Do you think we could get this done for 1.14? OK if not.

          Show
          julianhyde Julian Hyde added a comment - Do you think we could get this done for 1.14? OK if not.
          Hide
          christian.beikov Christian Beikov added a comment -

          Depends on what you want changed. I was kind of waiting for a comment from Chris Baynes on how to do that best.
          The part about retrieving the dialect during unparsing is still unclear to me. How should that be done? Currently I am, as you stated, wrongly assuming a single dialect.

          Show
          christian.beikov Christian Beikov added a comment - Depends on what you want changed. I was kind of waiting for a comment from Chris Baynes on how to do that best. The part about retrieving the dialect during unparsing is still unclear to me. How should that be done? Currently I am, as you stated, wrongly assuming a single dialect.
          Hide
          chris-baynes Chris Baynes added a comment -

          Sequence support is nice.
          I agree that SequenceSupport.unparseSequenceVal could definitely be merged into Handler.unparseCall with switches for NEXT_VALUE and CURRENT_VALUE.
          How about a bool flag on the BaseHandler which specifies if a dialect has sequence support, false by default, but dialect specific handlers can override it. Then moving SequenceSupport.extract method into the BaseHandler.
          That avoids having to change the DatabaseProduct constructor.

          Also, would be nice to have some tests for the unparsing for each dialect that needs it in RelToSqlConverterTest.

          About the CalcitePrepareImpl - perhaps I'm missing something, but I don't quite understand why you need to try to extract the dialect there.

          Show
          chris-baynes Chris Baynes added a comment - Sequence support is nice. I agree that SequenceSupport.unparseSequenceVal could definitely be merged into Handler.unparseCall with switches for NEXT_VALUE and CURRENT_VALUE . How about a bool flag on the BaseHandler which specifies if a dialect has sequence support, false by default, but dialect specific handlers can override it. Then moving SequenceSupport.extract method into the BaseHandler . That avoids having to change the DatabaseProduct constructor. Also, would be nice to have some tests for the unparsing for each dialect that needs it in RelToSqlConverterTest . About the CalcitePrepareImpl - perhaps I'm missing something, but I don't quite understand why you need to try to extract the dialect there.
          Hide
          christian.beikov Christian Beikov added a comment -

          I need the dialect since I have to call the unparse method on it. If we have a query like e.g. "select next value for s1.seq, next value for s2.seq" and s1 and s2 are schemas that have different dialects, I have to unparse the sequence call once for the dialect of s1 and once for s2. I somehow need to know in the SqlWriter which dialect a schema uses so that I can call the appropriate unparse method.
          Could you(Chris Baynes) maybe comment on the mailing list thread "JdbcAdapter Sort Limit and Offset" which is somewhat related in the overall discussion about how to handle the different dialect quirks. I'd like to essentially merge the Handler and other methods into a DBMS specific SqlDialect implementation and do the dialect construction through a configurable factory.

          Show
          christian.beikov Christian Beikov added a comment - I need the dialect since I have to call the unparse method on it. If we have a query like e.g. "select next value for s1.seq, next value for s2.seq" and s1 and s2 are schemas that have different dialects, I have to unparse the sequence call once for the dialect of s1 and once for s2. I somehow need to know in the SqlWriter which dialect a schema uses so that I can call the appropriate unparse method. Could you( Chris Baynes ) maybe comment on the mailing list thread "JdbcAdapter Sort Limit and Offset" which is somewhat related in the overall discussion about how to handle the different dialect quirks. I'd like to essentially merge the Handler and other methods into a DBMS specific SqlDialect implementation and do the dialect construction through a configurable factory.
          Hide
          chris-baynes Chris Baynes added a comment -

          I tried this out with postgres and noticed a problem - postgres (unlike other dbs) already returns a list of sequences in the jdbc metadata. So the table map building fails because of duplicates. It probably makes sense to just always skip SEQUENCE and TEMPORARY_SEQUENCE in the loop, and only get them from the dialect sequence information.

          I can't really comment right now on your dialect handling improvement, since I'm still not sure why the dialect handling here needs something different than what already exists. The dialect handlers are already able to push-down calls to different dbs as long as the planner finds the right plan.

          Show
          chris-baynes Chris Baynes added a comment - I tried this out with postgres and noticed a problem - postgres (unlike other dbs) already returns a list of sequences in the jdbc metadata. So the table map building fails because of duplicates. It probably makes sense to just always skip SEQUENCE and TEMPORARY_SEQUENCE in the loop, and only get them from the dialect sequence information. I can't really comment right now on your dialect handling improvement, since I'm still not sure why the dialect handling here needs something different than what already exists. The dialect handlers are already able to push-down calls to different dbs as long as the planner finds the right plan.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          I somehow need to know in the SqlWriter which dialect a schema uses so that I can call the appropriate unparse method.

          Run JdbcAdapterTest.testFilterUnionPlan in the debugger with a breakpoint in JdbcToEnumerableConverter.generateSql(SqlDialect). You'll see that the dialect comes from the convention of the input to the JdbcToEnumerableConverter. That input is a JdbcRel, its convention is an instance of JdbcConvention, and JdbcConvention has a sqlDialect field.

          In a hybrid plan (say joining a table in MySQL to a table in PostgreSQL) there will be two instances of JdbcConvention with different dialects.

          So you see the dialect is not associated with the schema. It is associated with the JdbcRel via JdbcConvention.

          It is conceivable that we would send just a JdbcValues to PostgreSQL and generate "select 1 union select 2". There is no schema in play because there is no table yet we still know to use PostgreSQL dialect.

          Show
          julianhyde Julian Hyde added a comment - - edited I somehow need to know in the SqlWriter which dialect a schema uses so that I can call the appropriate unparse method. Run JdbcAdapterTest.testFilterUnionPlan in the debugger with a breakpoint in JdbcToEnumerableConverter.generateSql(SqlDialect) . You'll see that the dialect comes from the convention of the input to the JdbcToEnumerableConverter . That input is a JdbcRel , its convention is an instance of JdbcConvention , and JdbcConvention has a sqlDialect field. In a hybrid plan (say joining a table in MySQL to a table in PostgreSQL) there will be two instances of JdbcConvention with different dialects. So you see the dialect is not associated with the schema. It is associated with the JdbcRel via JdbcConvention . It is conceivable that we would send just a JdbcValues to PostgreSQL and generate "select 1 union select 2". There is no schema in play because there is no table yet we still know to use PostgreSQL dialect.
          Hide
          christian.beikov Christian Beikov added a comment -

          Ok I understand how that works now, I was just confused because the SQL string was generated(apparently for no reason?) within the Prepare class. Anyway, I'm having a different problem now with the sequence implementation. The plan looks like LogicalProject(NEXT VALUE FOR S2) -> LogicalValues( (0) ) and during rule registration, no JDBC rules are registered since there is no JDBC relation used in RelNode. Any hints on how I can achieve that the plan gets converted to JdbcProject(NEXT VALUE FOR S2) -> JdbcValues( (0) )?

          Show
          christian.beikov Christian Beikov added a comment - Ok I understand how that works now, I was just confused because the SQL string was generated(apparently for no reason?) within the Prepare class. Anyway, I'm having a different problem now with the sequence implementation. The plan looks like LogicalProject(NEXT VALUE FOR S2) -> LogicalValues( (0) ) and during rule registration, no JDBC rules are registered since there is no JDBC relation used in RelNode. Any hints on how I can achieve that the plan gets converted to JdbcProject(NEXT VALUE FOR S2) -> JdbcValues( (0) ) ?
          Hide
          christian.beikov Christian Beikov added a comment -

          I introduced a custom RexCall subclass since I need to store the RelNode representing the sequence table, but now I'd need to enforce the usage of the "correct" plan. Currently the optimizer generates plans that use the linq4j implementation which is "wrong". Maybe you could take a look at the PR and tell me how to do that best?

          Show
          christian.beikov Christian Beikov added a comment - I introduced a custom RexCall subclass since I need to store the RelNode representing the sequence table, but now I'd need to enforce the usage of the "correct" plan. Currently the optimizer generates plans that use the linq4j implementation which is "wrong". Maybe you could take a look at the PR and tell me how to do that best?
          Hide
          julianhyde Julian Hyde added a comment -

          I'm not fond of introducing sub-classes of RexCall because there is a lot of code that wouldn't know how to handle them. But in this case it seems to be the least bad option.

          This PR is converging, but not fast enough to make it into 1.14. Let's aim to get this done for 1.15.

          With 71 files changed, the PR is much larger than I expected. It seems that there are changes to the life-cycle of dialects and handlers. I will need to understand whether those changes are necessary or at least beneficial. I am on vacation until tomorrow but will review again when I get back.

          Show
          julianhyde Julian Hyde added a comment - I'm not fond of introducing sub-classes of RexCall because there is a lot of code that wouldn't know how to handle them. But in this case it seems to be the least bad option. This PR is converging, but not fast enough to make it into 1.14. Let's aim to get this done for 1.15. With 71 files changed, the PR is much larger than I expected. It seems that there are changes to the life-cycle of dialects and handlers. I will need to understand whether those changes are necessary or at least beneficial. I am on vacation until tomorrow but will review again when I get back.
          Hide
          christian.beikov Christian Beikov added a comment -

          No worries. I didn't bet on this getting into 1.14 especially since I wanted to tackle the sql dialect specific handling as part of this.
          The lifecycle doesn't change, it's still bound to the schema, the handler was just merged into the dialect and dialect is now self configuring in the constructor. This is only a first step, so please don't look to deep into that yet.
          For now, I would just like to make a RexSeqCall be executed with the appropriate convention. Currently, the call is implemented via RexImpTable and I have no idea how to force that expression be implemented by the JdbcConvention. Any idea how to do that?
          After that, I would replace all the static DatabaseProduct usages with appropriate methods in the dialect to handle implementing the SQL translation. When I'm done with that, you can review it as a whole, but the current state is still incomplete.

          Show
          christian.beikov Christian Beikov added a comment - No worries. I didn't bet on this getting into 1.14 especially since I wanted to tackle the sql dialect specific handling as part of this. The lifecycle doesn't change, it's still bound to the schema, the handler was just merged into the dialect and dialect is now self configuring in the constructor. This is only a first step, so please don't look to deep into that yet. For now, I would just like to make a RexSeqCall be executed with the appropriate convention. Currently, the call is implemented via RexImpTable and I have no idea how to force that expression be implemented by the JdbcConvention. Any idea how to do that? After that, I would replace all the static DatabaseProduct usages with appropriate methods in the dialect to handle implementing the SQL translation. When I'm done with that, you can review it as a whole, but the current state is still incomplete.
          Hide
          julianhyde Julian Hyde added a comment -

          I see you have (a) created a sub-class of SqlDialect per database product, (b) deprecated DatabaseProduct. Can you explain the benefit of this? The current code is more concise and, in my opinion, better. Virtual methods are not the only way to override behavior; what Scala calls "case classes" are another approach, and at present we use a blend of the two, which I think is ideal.

          I support the creation of "handlers" to extend/modify the capabilities of SqlDialect objects. And that means we will have to change how SqlDialect is instantiated.

          Show
          julianhyde Julian Hyde added a comment - I see you have (a) created a sub-class of SqlDialect per database product, (b) deprecated DatabaseProduct. Can you explain the benefit of this? The current code is more concise and, in my opinion, better. Virtual methods are not the only way to override behavior; what Scala calls "case classes" are another approach, and at present we use a blend of the two, which I think is ideal. I support the creation of "handlers" to extend/modify the capabilities of SqlDialect objects. And that means we will have to change how SqlDialect is instantiated.
          Hide
          julianhyde Julian Hyde added a comment -

          There is a discussion about re-factoring SqlDialect in CALCITE-1933. Can we combine the discussions?

          Show
          julianhyde Julian Hyde added a comment - There is a discussion about re-factoring SqlDialect in CALCITE-1933 . Can we combine the discussions?
          Hide
          christian.beikov Christian Beikov added a comment -

          The idea is to encapsulate all the dialect specifics and at the same time making the system pluggable. By doing the non-standard unparsing stuff through dialect methods, it is possible for consumers of calcite to define a custom dialect targeted for their needs. Right now, only sequence fetch rendering is handled that way, but I plan to remove all checks against the database product of a dialect like e.g. dialect.getDatabaseProduct() == DatabaseProduct.MYSQL that are currently spread around the code base. I'd rather have methods that return whether a dialect supports a specific feature or handle rendering/unparsing an expression, clause or whole query than checks in the main code base that only work for specific database products.
          One of the most recent changes of supported features is MySQL since version 8 will have support for e.g. CTEs. How would the current code handle that? The way I am proposing it, the MysqlDialect can self configure during instantiation, detect based on the version if a feature is supported and make the capability accessible to the code that handles the unparsing.

          Apart from that, I'd still need to know how to actually make the planner choose the correct plan. Since there is no Jdbc input to the Project node, the query isn't rendered with the JdbcConvention. Any idea how I could achieve that?

          Show
          christian.beikov Christian Beikov added a comment - The idea is to encapsulate all the dialect specifics and at the same time making the system pluggable. By doing the non-standard unparsing stuff through dialect methods, it is possible for consumers of calcite to define a custom dialect targeted for their needs. Right now, only sequence fetch rendering is handled that way, but I plan to remove all checks against the database product of a dialect like e.g. dialect.getDatabaseProduct() == DatabaseProduct.MYSQL that are currently spread around the code base. I'd rather have methods that return whether a dialect supports a specific feature or handle rendering/unparsing an expression, clause or whole query than checks in the main code base that only work for specific database products. One of the most recent changes of supported features is MySQL since version 8 will have support for e.g. CTEs. How would the current code handle that? The way I am proposing it, the MysqlDialect can self configure during instantiation, detect based on the version if a feature is supported and make the capability accessible to the code that handles the unparsing. Apart from that, I'd still need to know how to actually make the planner choose the correct plan. Since there is no Jdbc input to the Project node, the query isn't rendered with the JdbcConvention. Any idea how I could achieve that?
          Hide
          chris-baynes Chris Baynes added a comment -

          Julian Hyde I think you mean CALCITE-1913 where there's a discussion about including the db version in the dialect.

          Show
          chris-baynes Chris Baynes added a comment - Julian Hyde I think you mean CALCITE-1913 where there's a discussion about including the db version in the dialect.
          Hide
          julianhyde Julian Hyde added a comment -

          Yes, I meant 1913. Thanks for the correction.

          Show
          julianhyde Julian Hyde added a comment - Yes, I meant 1913. Thanks for the correction.
          Hide
          julianhyde Julian Hyde added a comment -

          It seems that we cannot fix this case (CALCITE-1940) without first doing CALCITE-1913 (make SqlDialect more extensible). Christian Beikov, Do you think it makes sense to re-organize PR 514 into commits that fix 1913 followed by commits that fix 1940? It would make it easier to review. The PR is already large, and I disagree with some parts but not others.

          Show
          julianhyde Julian Hyde added a comment - It seems that we cannot fix this case ( CALCITE-1940 ) without first doing CALCITE-1913 (make SqlDialect more extensible). Christian Beikov , Do you think it makes sense to re-organize PR 514 into commits that fix 1913 followed by commits that fix 1940? It would make it easier to review. The PR is already large, and I disagree with some parts but not others.
          Hide
          christian.beikov Christian Beikov added a comment -

          Julian Hyde I don't think it makes sense as that would require quite a rewrite and might not be fully what you agree with either. Apart from that, the sequence calls are still not pushed down to the DBMS yet so this is still a bit away from being done.

          Show
          christian.beikov Christian Beikov added a comment - Julian Hyde I don't think it makes sense as that would require quite a rewrite and might not be fully what you agree with either. Apart from that, the sequence calls are still not pushed down to the DBMS yet so this is still a bit away from being done.
          Hide
          julianhyde Julian Hyde added a comment -

          I don't think it needs a rewrite. I'd do a git rebase -i, and try to make the first commit something deals with pluggable dialects (i.e. factories) but doesn't touch sequences.

          I'd been assuming that you this consider this PR ready for review; but if not, let me know when it is.

          Show
          julianhyde Julian Hyde added a comment - I don't think it needs a rewrite. I'd do a git rebase -i , and try to make the first commit something deals with pluggable dialects (i.e. factories) but doesn't touch sequences. I'd been assuming that you this consider this PR ready for review; but if not, let me know when it is.
          Hide
          julianhyde Julian Hyde added a comment -

          Having got CALCITE-1913's PR into a reasonable state, I am now attempting to rebase this PR on top of it. You can see the work-in-progress here: https://github.com/julianhyde/calcite/tree/1940-dialect-sequence. I'll post another comment when it's ready for review.

          Show
          julianhyde Julian Hyde added a comment - Having got CALCITE-1913 's PR into a reasonable state, I am now attempting to rebase this PR on top of it. You can see the work-in-progress here: https://github.com/julianhyde/calcite/tree/1940-dialect-sequence . I'll post another comment when it's ready for review.
          Hide
          christian.beikov Christian Beikov added a comment -

          I looked through your changes and they look good to me so far, but as I mentioned a few times, the sequence PR isn't ready yet
          I am still struggling with getting the sequence call being executed with the JdbcConvention because of (wrong?) planning, but maybe you have some time to take a look and find a solution for this. I'm a bit busy with other stuff now until the weekend anyway, so just tell me when you are ready so that I can pick up work from that point.

          Show
          christian.beikov Christian Beikov added a comment - I looked through your changes and they look good to me so far, but as I mentioned a few times, the sequence PR isn't ready yet I am still struggling with getting the sequence call being executed with the JdbcConvention because of (wrong?) planning, but maybe you have some time to take a look and find a solution for this. I'm a bit busy with other stuff now until the weekend anyway, so just tell me when you are ready so that I can pick up work from that point.
          Hide
          julianhyde Julian Hyde added a comment -

          I've just pushed commit https://github.com/julianhyde/calcite/commit/b17a2aa068d07f5a3483099e9efcc2252e348a55 to my branch https://github.com/julianhyde/calcite/tree/1940-dialect-sequence and I think the process of creating SqlDialect with sequence support is in good shape. In particular I replaced SequenceSupportResolver, which was frankly difficult to understand, with OptionalSequenceSupport, which is more concrete and only need to be implemented for one dialect (SQL Server).

          I don't intend to look into the JdbcConvention issues any time soon. But anyway, commit b17a2aa0 would be a good point to start when you next start work.

          Show
          julianhyde Julian Hyde added a comment - I've just pushed commit https://github.com/julianhyde/calcite/commit/b17a2aa068d07f5a3483099e9efcc2252e348a55 to my branch https://github.com/julianhyde/calcite/tree/1940-dialect-sequence and I think the process of creating SqlDialect with sequence support is in good shape. In particular I replaced SequenceSupportResolver, which was frankly difficult to understand, with OptionalSequenceSupport, which is more concrete and only need to be implemented for one dialect (SQL Server). I don't intend to look into the JdbcConvention issues any time soon. But anyway, commit b17a2aa0 would be a good point to start when you next start work.

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              christian.beikov Christian Beikov
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development