[SPARK-36663] When the existing field name is a number, an error will be reported when reading the orc file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.3.0
Component/s: SQL
Labels:
None

Description

You can use the following methods to reproduce the problem:

val path = "file:///tmp/test_orc"

spark.range(1).withColumnRenamed("id", "100").repartition(1).write.orc(path)

spark.read.orc(path)

The error message is like this:

org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '100' expecting {'ADD', 'AFTER'....

== SQL ==
struct<100:bigint>
-------^^^

The error is actually issued by this line of code:

CatalystSqlParser.parseDataType("100:bigint")

The specific background is that spark calls the above code in the process of converting the schema of the orc file into the catalyst schema.

// code in OrcUtils
private def toCatalystSchema(schema: TypeDescription): StructType =
Unknown macro: { CharVarcharUtils.replaceCharVarcharWithStringInSchema(CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]) }

There are two solutions I currently think of:

Modify the syntax analysis of SparkSQL to identify this kind of schema
The TypeDescription.toString method should add the quote symbol to the numeric column name, because the following syntax is supported:
CatalystSqlParser.parseDataType("`100`:bigint")

But currently TypeDescription does not support changing the UNQUOTED_NAMES variable, should we first submit a pr to the orc project to support the configuration of this variable。

How do spark members think about this issue?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2021-09-03-20-56-28-846.png
03/Sep/21 12:56
177 kB
mcdull_zhang

Issue Links

links to

[Github] Pull Request #33915 (sarutak)

[Github] Pull Request #37440 (mcdull-zhang)

Activity

People

Assignee:: Kousuke Saruta

Reporter:: mcdull_zhang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Sep/21 12:55

Updated:: 08/Aug/22 13:00

Resolved:: 17/Sep/21 13:55