Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36663

When the existing field name is a number, an error will be reported when reading the orc file

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      You can use the following methods to reproduce the problem:

      val path = "file:///tmp/test_orc"

      spark.range(1).withColumnRenamed("id", "100").repartition(1).write.orc(path)

      spark.read.orc(path)

      The error message is like this:

      org.apache.spark.sql.catalyst.parser.ParseException:
      mismatched input '100' expecting {'ADD', 'AFTER'....

      == SQL ==
      struct<100:bigint>
      -------^^^

      The error is actually issued by this line of code:

      CatalystSqlParser.parseDataType("100:bigint")

       

      The specific background is that spark calls the above code in the process of converting the schema of the orc file into the catalyst schema.

      // code in OrcUtils
      private def toCatalystSchema(schema: TypeDescription): StructType =
      Unknown macro: {  CharVarcharUtils.replaceCharVarcharWithStringInSchema(CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]) }

      There are two solutions I currently think of:

      1. Modify the syntax analysis of SparkSQL to identify this kind of schema
      2. The TypeDescription.toString method should add the quote symbol to the numeric column name, because the following syntax is supported:

        CatalystSqlParser.parseDataType("`100`:bigint")

      But currently TypeDescription does not support changing the UNQUOTED_NAMES variable, should we first submit a pr to the orc project to support the configuration of this variable。

       

      How do spark members think about this issue?

       

      Attachments

        1. image-2021-09-03-20-56-28-846.png
          177 kB
          mcdull_zhang

        Activity

          People

            sarutak Kousuke Saruta
            mcdull_zhang mcdull_zhang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: