[SPARK-24934] Complex type and binary type in in-memory partition pruning does not work due to missing upper/lower bounds cases - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

Delete

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.3.1, 2.4.0
Fix Version/s: 2.3.2, 2.4.0
Component/s: SQL
Labels:
- correctness

Description

For example, if array is used (where the lower and upper bounds for its column batch are null)), it looks wrongly filtering all data out:

scala> import org.apache.spark.sql.functions
import org.apache.spark.sql.functions

scala> val df = Seq(Array("a", "b"), Array("c", "d")).toDF("arrayCol")
df: org.apache.spark.sql.DataFrame = [arrayCol: array<string>]

scala> df.filter(df.col("arrayCol").eqNullSafe(functions.array(functions.lit("a"), functions.lit("b")))).show()
+--------+
|arrayCol|
+--------+
|  [a, b]|
+--------+


scala> df.cache().filter(df.col("arrayCol").eqNullSafe(functions.array(functions.lit("a"), functions.lit("b")))).show()
+--------+
|arrayCol|
+--------+
+--------+