[SPARK-46502] Support timestamp types in UnwrapCastInBinaryComparison - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 4.0.0
Component/s: SQL
Labels:
- pull-request-available

Description

We have an optimization rule `UnwrapCastInBinaryComparison` that handles similar cases but it doesn't cover timestamp types.

For a query plan like

```
== Analyzed Logical Plan ==
batch: timestamp
Project batch#26466
+- Filter (batch#26466 >= cast(2023-12-21 10:00:00 as timestamp))
+- SubqueryAlias spark_catalog.default.timestamp_view
+- View (`spark_catalog`.`default`.`timestamp_view`, batch#26466)
+- Project cast(batch#26467 as timestamp) AS batch#26466
+- Project cast(batch#26463 as timestamp) AS batch#26467
+- SubqueryAlias spark_catalog.default.table_timestamp
+- Relation spark_catalog.default.table_timestampbatch#26463 parquet

== Optimized Logical Plan ==
Project cast(batch#26463 as timestamp) AS batch#26466
+- Filter (isnotnull(batch#26463) AND (cast(batch#26463 as timestamp) >= 2023-12-21 10:00:00))
+- Relation spark_catalog.default.table_timestampbatch#26463 parquet
```

The predicate compares a timestamp_ntz column with a literal value. As the column is wrapped in a cast expression to timestamp type, the literal (string) is wrapped with a cast to timestamp type. The literal with cast is foldable so it is evaluated to literal of timestamp earlier. So the predicate becomes `cast(batch#26463 as timestamp) >= 2023-12-21 10:00:00`. As the cast is in column side, it cannot be pushed down to data source/table.

Attachments

Issue Links

links to

GitHub Pull Request #44480

Activity

People

Assignee:: L. C. Hsieh

Reporter:: L. C. Hsieh

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Dec/23 09:11

Updated:: 27/Dec/23 19:52

Resolved:: 27/Dec/23 19:42