Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1684

Change default precision of VARCHAR and VARBINARY from 1 to "unspecified"

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.11.0
    • Fix Version/s: 1.12.0
    • Component/s: core
    • Labels:
      None

      Description

      The behavior for Calcite (and SQL Server) is to set VARCHAR precision to 1 when not specified whereas Phoenix sets the max integer value of 2147483647.

      It doesn't really make sense to create a VARCHAR for a max length of 1 (it takes more bytes to store the length of each row than the actual value) and it shouldn't be the default behavior. I think we should adopt the Phoenix behavior. Do we need to make this configurable via SqlConformance or other?

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment - - edited

          I basically agree. I would like to VARCHAR and VARBINARY of unbounded length, similar to the "String" or "Text" types in some databases. I would store a precision of -1, and print the type as "VARCHAR(*)" or something. I don't want anyone to think that you can store a string of length 2^31-1 but not a string of length 2^31, because that's not true.

          Show
          julianhyde Julian Hyde added a comment - - edited I basically agree. I would like to VARCHAR and VARBINARY of unbounded length, similar to the "String" or "Text" types in some databases. I would store a precision of -1, and print the type as "VARCHAR(*)" or something. I don't want anyone to think that you can store a string of length 2^31-1 but not a string of length 2^31, because that's not true.
          Hide
          kliew Kevin Liew added a comment - - edited

          Phoenix has a limit of 2147483647 for VARCHAR and VARBINARY when specifying the precision.

          VARCHAR ( precisionInt )
          

          https://phoenix.apache.org/language/datatypes.html#varchar_type
          https://phoenix.apache.org/language/index.html#int

          In Calcite, we will report RelDataType.PRECISION_NOT_SPECIFIED which is -1. In Phoenix, we will convert that value to 2147483647 when getting column metadata. Does that sound correct?

          James Taylor, if we do not specify the precision, is the string length unbounded (despite the max specifiable precision being 2147483647)? And in that case, Phoenix would have the same behavior as Calcite and print -1 for COLUMN_SIZE metadata?

          Show
          kliew Kevin Liew added a comment - - edited Phoenix has a limit of 2147483647 for VARCHAR and VARBINARY when specifying the precision. VARCHAR ( precisionInt ) https://phoenix.apache.org/language/datatypes.html#varchar_type https://phoenix.apache.org/language/index.html#int In Calcite, we will report RelDataType.PRECISION_NOT_SPECIFIED which is -1 . In Phoenix, we will convert that value to 2147483647 when getting column metadata. Does that sound correct? James Taylor , if we do not specify the precision, is the string length unbounded (despite the max specifiable precision being 2147483647)? And in that case, Phoenix would have the same behavior as Calcite and print -1 for COLUMN_SIZE metadata?
          Hide
          julianhyde Julian Hyde added a comment -

          I have modified the patch to use -1. This has the nice effect that if the user writes "VARCHAR" then it un-parses as "VARCHAR" because we store -1 meaning "unspecified", whereas with Kevin Liew's current change it un-parses as "VARCHAR(2147483647)".

          Also note that Calcite's default max precision for VARCHAR is 65,536. But that can be overridden in RelDataTypeSystem, and maybe Phoenix does that already.

          In my modified patch, if you concatenate two strings and their combined length exceeds the maximum precision, then the result type will have precision "unspecified".

          Here is my proposed change: https://github.com/julianhyde/calcite/commit/52c58b1ec68660ee9e9e084472b16ff6ec46d1f1. Please review.

          Show
          julianhyde Julian Hyde added a comment - I have modified the patch to use -1. This has the nice effect that if the user writes "VARCHAR" then it un-parses as "VARCHAR" because we store -1 meaning "unspecified", whereas with Kevin Liew 's current change it un-parses as "VARCHAR(2147483647)". Also note that Calcite's default max precision for VARCHAR is 65,536. But that can be overridden in RelDataTypeSystem, and maybe Phoenix does that already. In my modified patch, if you concatenate two strings and their combined length exceeds the maximum precision, then the result type will have precision "unspecified". Here is my proposed change: https://github.com/julianhyde/calcite/commit/52c58b1ec68660ee9e9e084472b16ff6ec46d1f1 . Please review.
          Hide
          julianhyde Julian Hyde added a comment -

          Kevin Liew, James Taylor, I would like to commit this change for 1.12. Can you let me know whether my proposal - to set the precision to -1, not Integer.MAX_VALUE - will meet Phoenix's needs. Integer.MAX_VALUE is a somewhat arbitrary value that might make sense for Phoenix but doesn't make sense for other DBs.

          Show
          julianhyde Julian Hyde added a comment - Kevin Liew , James Taylor , I would like to commit this change for 1.12. Can you let me know whether my proposal - to set the precision to -1, not Integer.MAX_VALUE - will meet Phoenix's needs. Integer.MAX_VALUE is a somewhat arbitrary value that might make sense for Phoenix but doesn't make sense for other DBs.
          Hide
          kliew Kevin Liew added a comment -

          +1 this is a good general solution. If Phoenix has a max precision of 2147483647 when the precision is unspecified, it can be done in the Phoenix layer.

          Show
          kliew Kevin Liew added a comment - +1 this is a good general solution. If Phoenix has a max precision of 2147483647 when the precision is unspecified, it can be done in the Phoenix layer.
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/0d996daf .
          Show
          julianhyde Julian Hyde added a comment - Commits http://git-wip-us.apache.org/repos/asf/calcite/commit/7f46cd5f and http://git-wip-us.apache.org/repos/asf/calcite/commit/6b54b6ec fixed some plan changes caused by this change.
          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.12.0 (2017-03-24).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.12.0 (2017-03-24).

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              kliew Kevin Liew
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development