Description
Currently, Spark SQL defines primitive types as:
| identifier (LEFT_PAREN INTEGER_VALUE (COMMA INTEGER_VALUE)* RIGHT_PAREN)? #primitiveDataType
where identifier is parsed later by visitPrimitiveDataType():
override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType = withOrigin(ctx) { val dataType = ctx.identifier.getText.toLowerCase(Locale.ROOT) (dataType, ctx.INTEGER_VALUE().asScala.toList) match { case ("boolean", Nil) => BooleanType case ("tinyint" | "byte", Nil) => ByteType case ("smallint" | "short", Nil) => ShortType case ("int" | "integer", Nil) => IntegerType case ("bigint" | "long", Nil) => LongType case ("float" | "real", Nil) => FloatType ...
So, the types are not Spark SQL keywords, and this causes some inconveniences while analysing/transforming the lexer tree. For example, while forming the stable column aliases.
Need to define Spark SQL types in SqlBaseLexer.g4.
Also, typed literals have the same issue. The types "DATE", "TIMESTAMP_NTZ", "TIMESTAMP", "TIMESTAMP_LTZ", "INTERVAL", and "X" should be defined as base lexer tokens.
Attachments
Issue Links
- is caused by
-
SPARK-40822 Use stable derived-column-alias algorithm, suitable for CREATE VIEW
- Resolved
- is cloned by
-
SPARK-42979 Define literal constructors as keywords
- Resolved