Description
Currently, filter pushdown will not work if Parquet schema and Hive metastore schema are in different letter cases even spark.sql.caseSensitive is false.
Like the below case:
spark.range(10).write.parquet("/tmp/data") sql("DROP TABLE t") sql("CREATE TABLE t (ID LONG) USING parquet LOCATION '/tmp/data'") sql("select * from t where id > 0").show
No filter will be pushed down.
scala> sql("select * from t where id > 0").explain // Filters are pushed with `ID` == Physical Plan == *(1) Project [ID#90L] +- *(1) Filter (isnotnull(id#90L) && (id#90L > 0)) +- *(1) FileScan parquet default.t[ID#90L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/data], PartitionFilters: [], PushedFilters: [IsNotNull(ID), GreaterThan(ID,0)], ReadSchema: struct<ID:bigint> scala> sql("select * from t").show // Parquet returns NULL for `ID` because it has `id`. +----+ | ID| +----+ |null| |null| |null| |null| |null| |null| |null| |null| |null| |null| +----+ scala> sql("select * from t where id > 0").show // `NULL > 0` is `false`. +---+ | ID| +---+ +---+
Attachments
Attachments
Issue Links
- is related to
-
SPARK-25206 wrong records are returned when Hive metastore schema and parquet schema are in different letter cases
- Resolved
- links to