Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42873

Define Spark SQL types as keywords

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 3.5.0
    • SQL
    • None

    Description

      Currently, Spark SQL defines primitive types as:

       

      | identifier (LEFT_PAREN INTEGER_VALUE
        (COMMA INTEGER_VALUE)* RIGHT_PAREN)?                      #primitiveDataType
      

      where identifier is parsed later by visitPrimitiveDataType():

        override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType = withOrigin(ctx) {
          val dataType = ctx.identifier.getText.toLowerCase(Locale.ROOT)
          (dataType, ctx.INTEGER_VALUE().asScala.toList) match {
            case ("boolean", Nil) => BooleanType
            case ("tinyint" | "byte", Nil) => ByteType
            case ("smallint" | "short", Nil) => ShortType
            case ("int" | "integer", Nil) => IntegerType
            case ("bigint" | "long", Nil) => LongType
            case ("float" | "real", Nil) => FloatType
      ...
      

      So, the types are not Spark SQL keywords, and this causes some inconveniences while analysing/transforming the lexer tree. For example, while forming the stable column aliases.

      Need to define Spark SQL types in SqlBaseLexer.g4.

      Also, typed literals have the same issue. The types "DATE", "TIMESTAMP_NTZ", "TIMESTAMP", "TIMESTAMP_LTZ", "INTERVAL", and "X" should be defined as base lexer tokens.

       

       

      Attachments

        Issue Links

          Activity

            People

              maxgekk Max Gekk
              maxgekk Max Gekk
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: