Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.2.0
-
None
-
None
Description
When reading a dictionary-encoded integer column from a Parquet file, and users specify read schema to be bigint, Spark currently will fail with the following exception:
java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344)
To reproduce:
val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, i.toString))
withParquetFile(data) { path =>
val readSchema = StructType(Seq(StructField("_1", LongType)))
spark.read.schema(readSchema).parquet(path).first()
}
Attachments
Issue Links
- Blocked
-
SPARK-36990 Long columns cannot read columns with INT32 type in the parquet file
- Resolved
- relates to
-
SPARK-23007 Add schema evolution test suite for file-based data sources
- Resolved