Details
Description
How to reproduce this issue:
spark.sql( """ |CREATE TABLE `t1` ( | `_col0` INT, | `_col1` STRING, | `_col2` STRUCT<`c1`: STRING, `c2`: STRING, `c3`: STRING, `c4`: BIGINT>, | `_col3` STRING) |USING orc |PARTITIONED BY (_col3) |""".stripMargin) spark.sql("INSERT INTO `t1` values(1, '2', null, '2021-02-01')") spark.sql("SELECT _col2.c1, _col0 FROM `t1` WHERE _col3 = '2021-02-01'").show
Error message:
java.lang.AssertionError: assertion failed: The given data schema struct<_col0:int,_col2:struct<c1:string>> has less fields than the actual ORC physical schema, no idea which columns were dropped, fail to read. Try to disable at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.execution.datasources.orc.OrcUtils$.requestedColumnIds(OrcUtils.scala:159) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$3(OrcFileFormat.scala:180) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2620) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:178) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:117) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:165) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:94) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:756)
Attachments
Issue Links
- is duplicated by
-
SPARK-35010 nestedSchemaPruning causes issue when reading hive generated Orc files
- Resolved
-
SPARK-35190 all columns are read even if column pruning applies when spark3.0 read table written by spark2.2
- Resolved
-
SPARK-35191 all columns are read even if column pruning applies when spark3.0 read table written by spark2.2
- Resolved
- is related to
-
HIVE-4243 Fix column names in FileSinkOperator
- Closed
- links to
(4 links to)