[SPARK-25207] Case-insensitve field resolution for filter pushdown when reading Parquet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0
Component/s: SQL
Labels:
- Parquet

Description

Currently, filter pushdown will not work if Parquet schema and Hive metastore schema are in different letter cases even spark.sql.caseSensitive is false.

Like the below case:

spark.range(10).write.parquet("/tmp/data")
sql("DROP TABLE t")
sql("CREATE TABLE t (ID LONG) USING parquet LOCATION '/tmp/data'")
sql("select * from t where id > 0").show

~~No filter will be pushed down.~~

scala> sql("select * from t where id > 0").explain   // Filters are pushed with `ID`
== Physical Plan ==
*(1) Project [ID#90L]
+- *(1) Filter (isnotnull(id#90L) && (id#90L > 0))
   +- *(1) FileScan parquet default.t[ID#90L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/data], PartitionFilters: [], PushedFilters: [IsNotNull(ID), GreaterThan(ID,0)], ReadSchema: struct<ID:bigint>

scala> sql("select * from t").show    // Parquet returns NULL for `ID` because it has `id`.
+----+
|  ID|
+----+
|null|
|null|
|null|
|null|
|null|
|null|
|null|
|null|
|null|
|null|
+----+

scala> sql("select * from t where id > 0").show   // `NULL > 0` is `false`.
+---+
| ID|
+---+
+---+

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image.png
27/Aug/18 01:49
41 kB
Dongjoon Hyun

Issue Links

is related to

SPARK-25206 wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

Resolved

links to

[Github] Pull Request #22197 (yucai)

Activity

People

Assignee:: yucai

Reporter:: yucai

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 23/Aug/18 07:56

Updated:: 13/Sep/18 07:54

Resolved:: 31/Aug/18 11:25