Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1241

Add a freemarker variable for adding non reserved keywords to Parser.jj template

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: core
    • Labels:
      None

      Description

      Currently Calcite parser allows customizing parser grammar for supporting new Sql constructs (ex. SHOW TABLES or CREATE TABLE ... AS QUERY) through freemarker template variables. There is a freemarker template variable to allow new keywords, but all these keywords are added as reserved keywords which are not allowed as identifiers unless escaped with ` (or configured casing character).

      This JIRA is to add a freemarker template variable for adding non reserved keywords list.

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          It makes sense, and the patch looks good. But I wonder, could we go further? Are there any pitfalls that people run into every time they add a keyword, and is there something we could do to make that easier?

          Show
          julianhyde Julian Hyde added a comment - It makes sense, and the patch looks good. But I wonder, could we go further? Are there any pitfalls that people run into every time they add a keyword, and is there something we could do to make that easier?
          Hide
          vkorukanti Venki Korukanti added a comment -

          This is the most common issue, I have seen. One other thing is I encounter is: if the first two keywords in a SQL construct are non-reserved keywords, parser goes down the expression parsing path (as part of OrderedQueryOrExpr). So any SQL construct need to have at least one reserved keyword in the first two tokens. Is it better to separate out query and expression parsing into two methods and move the expression parsing to last choice? Expression most likely is going to have identifiers. Having it in the beginning only causes non reserved keywords to be interpreted as identifiers.

          Show
          vkorukanti Venki Korukanti added a comment - This is the most common issue, I have seen. One other thing is I encounter is: if the first two keywords in a SQL construct are non-reserved keywords, parser goes down the expression parsing path (as part of OrderedQueryOrExpr ). So any SQL construct need to have at least one reserved keyword in the first two tokens. Is it better to separate out query and expression parsing into two methods and move the expression parsing to last choice? Expression most likely is going to have identifiers. Having it in the beginning only causes non reserved keywords to be interpreted as identifiers.
          Hide
          julianhyde Julian Hyde added a comment -

          SQL syntax unfortunately allows expressions and queries in the same context and there is no fixed lookahead that can distinguish which it is going to be. For example, x IN (((((((select 1 union ...) minus ... ) ... )))

          I don't know whether JavaCC has a performance difference depending on how you order the alternatives. If you can change the parser such that it performs better and gives the same results, have at it! Feel free to devise (or generate) queries where parser performance sucks.

          Your comments underscore the value of reserved keywords. I agree. They make the parser's life much easier (and error messages easier to understand). Unfortunately it is hard to justify adding reserved keywords. Generally we just take the list that comes with the SQL standard.

          Show
          julianhyde Julian Hyde added a comment - SQL syntax unfortunately allows expressions and queries in the same context and there is no fixed lookahead that can distinguish which it is going to be. For example, x IN (((((((select 1 union ...) minus ... ) ... ))) I don't know whether JavaCC has a performance difference depending on how you order the alternatives. If you can change the parser such that it performs better and gives the same results, have at it! Feel free to devise (or generate) queries where parser performance sucks. Your comments underscore the value of reserved keywords. I agree. They make the parser's life much easier (and error messages easier to understand). Unfortunately it is hard to justify adding reserved keywords. Generally we just take the list that comes with the SQL standard.
          Hide
          julianhyde Julian Hyde added a comment -

          The list of reserved keywords doesn't change very often (because it comes down from the SQL standard). A common mistake is to add a keyword and forget to add it to the non-reserved list. Should we have a list of reserved words (in a test, say) and a test that will break (and require manual intervention) each time someone adds a reserved word (intentionally or otherwise).

          We already have SqlParserTest.testGenerateKeyWords, which does something similar, but doesn't check for new reserved keywords explicitly. Plus it doesn't help projects that extend the core SQL parser, such as Drill and Phoenix.

          Show
          julianhyde Julian Hyde added a comment - The list of reserved keywords doesn't change very often (because it comes down from the SQL standard). A common mistake is to add a keyword and forget to add it to the non-reserved list. Should we have a list of reserved words (in a test, say) and a test that will break (and require manual intervention) each time someone adds a reserved word (intentionally or otherwise). We already have SqlParserTest.testGenerateKeyWords, which does something similar, but doesn't check for new reserved keywords explicitly. Plus it doesn't help projects that extend the core SQL parser, such as Drill and Phoenix.
          Hide
          vkorukanti Venki Korukanti added a comment -

          I think it makes sense to have a test in Calcite to check for new keywords and throw error if the test keyword list is not updated or forgot to add the new keywords to non-reserved keyword list. Testing the extended parsers is difficult unless they have their own tests which extend the list provided in the Calcite test. We can make the test code in extended parser smaller by providing the calcite keyword list. Let me know if you want the test with the patch.

          Show
          vkorukanti Venki Korukanti added a comment - I think it makes sense to have a test in Calcite to check for new keywords and throw error if the test keyword list is not updated or forgot to add the new keywords to non-reserved keyword list. Testing the extended parsers is difficult unless they have their own tests which extend the list provided in the Calcite test. We can make the test code in extended parser smaller by providing the calcite keyword list. Let me know if you want the test with the patch.
          Hide
          julianhyde Julian Hyde added a comment -

          Yes, please add a test to SqlParserTest as part of the patch. Call getJdbcKeywords (as in testMetadata), split on commas, remove non-reserved keywords, and you have the list of reserved keywords. Compare that with a list hard-coded in the test.

          Drill should consider calling SqlParserTest from within its test suite (one or two tests might need to be overridden or disabled, but it would still provide useful information). But anyway that is definitely part of this task.

          Show
          julianhyde Julian Hyde added a comment - Yes, please add a test to SqlParserTest as part of the patch. Call getJdbcKeywords (as in testMetadata), split on commas, remove non-reserved keywords, and you have the list of reserved keywords. Compare that with a list hard-coded in the test. Drill should consider calling SqlParserTest from within its test suite (one or two tests might need to be overridden or disabled, but it would still provide useful information). But anyway that is definitely part of this task.
          Hide
          vkorukanti Venki Korukanti added a comment -

          Updated the patch in PR. One change: Used getTokens and deciding whether the token is a reserved keyword or not. getJdbcKeywords is returning hardcoded SQL 92 standard reserved keyword list.

          Show
          vkorukanti Venki Korukanti added a comment - Updated the patch in PR. One change: Used getTokens and deciding whether the token is a reserved keyword or not. getJdbcKeywords is returning hardcoded SQL 92 standard reserved keyword list.
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/20ba4346 . Thanks for the PR, Venki Korukanti !
          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.8.0 (2016-06-13).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.8.0 (2016-06-13).

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              vkorukanti Venki Korukanti
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development