Description
It will not pushdown filters for In/InSet predicates:
spark.range(50).selectExpr("cast(id as int) as id").write.mode("overwrite").parquet("/tmp/parquet/t1") spark.read.parquet("/tmp/parquet/t1").where("id in (1L, 2L, 4L)").explain spark.read.parquet("/tmp/parquet/t1").where("id = 1L or id = 2L or id = 4L").explain
== Physical Plan == *(1) Filter cast(id#5 as bigint) IN (1,2,4) +- *(1) ColumnarToRow +- FileScan parquet [id#5] Batched: true, DataFilters: [cast(id#5 as bigint) IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int> == Physical Plan == *(1) Filter (((id#7 = 1) OR (id#7 = 2)) OR (id#7 = 4)) +- *(1) ColumnarToRow +- FileScan parquet [id#7] Batched: true, DataFilters: [(((id#7 = 1) OR (id#7 = 2)) OR (id#7 = 4))], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [Or(Or(EqualTo(id,1),EqualTo(id,2)),EqualTo(id,4))], ReadSchema: struct<id:int>
Attachments
Issue Links
- causes
-
SPARK-36130 UnwrapCastInBinaryComparison fail when in.list contain CheckOverflow expression
- Resolved
- is related to
-
SPARK-24994 Add UnwrapCastInBinaryComparison optimizer to simplify literal types
- Resolved
- links to