[SPARK-42873] Define Spark SQL types as keywords - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 3.5.0
Component/s: SQL
Labels:
None

Description

Currently, Spark SQL defines primitive types as:

| identifier (LEFT_PAREN INTEGER_VALUE
  (COMMA INTEGER_VALUE)* RIGHT_PAREN)?                      #primitiveDataType

where identifier is parsed later by visitPrimitiveDataType():

  override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType = withOrigin(ctx) {
    val dataType = ctx.identifier.getText.toLowerCase(Locale.ROOT)
    (dataType, ctx.INTEGER_VALUE().asScala.toList) match {
      case ("boolean", Nil) => BooleanType
      case ("tinyint" | "byte", Nil) => ByteType
      case ("smallint" | "short", Nil) => ShortType
      case ("int" | "integer", Nil) => IntegerType
      case ("bigint" | "long", Nil) => LongType
      case ("float" | "real", Nil) => FloatType
...

So, the types are not Spark SQL keywords, and this causes some inconveniences while analysing/transforming the lexer tree. For example, while forming the stable column aliases.

Need to define Spark SQL types in SqlBaseLexer.g4.

Also, typed literals have the same issue. The types "DATE", "TIMESTAMP_NTZ", "TIMESTAMP", "TIMESTAMP_LTZ", "INTERVAL", and "X" should be defined as base lexer tokens.

Attachments

Issue Links

is caused by

SPARK-40822 Use stable derived-column-alias algorithm, suitable for CREATE VIEW

Resolved

is cloned by

SPARK-42979 Define literal constructors as keywords

Resolved

Activity

People

Assignee:: Max Gekk

Reporter:: Max Gekk

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Mar/23 16:12

Updated:: 30/Mar/23 13:23

Resolved:: 29/Mar/23 15:52