[SPARK-37240] Cannot read partitioned parquet files with ANSI interval partition values - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.3.0
Component/s: SQL
Labels:
None

Description

The code below demonstrates the issue:

scala> sql("SELECT INTERVAL '1' YEAR AS i, 0 as id").write.partitionBy("i").parquet("/Users/maximgekk/tmp/ansi_interval_parquet")


scala> spark.read.schema("i INTERVAL YEAR, id INT").parquet("/Users/maximgekk/tmp/ansi_interval_parquet").show(false)
21/11/08 10:56:36 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.RuntimeException: DataType INTERVAL YEAR is not supported in column vectorized reader.
	at org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:100)
	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:243)

Attachments

Issue Links

links to

[Github] Pull Request #34517 (MaxGekk)

Activity

People

Assignee:: Max Gekk

Reporter:: Max Gekk

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Nov/21 07:59

Updated:: 08/Nov/21 15:24

Resolved:: 08/Nov/21 15:24