Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
Description
We have an optimization rule `UnwrapCastInBinaryComparison` that handles similar cases but it doesn't cover timestamp types.
For a query plan like
```
== Analyzed Logical Plan ==
batch: timestamp
Project batch#26466
+- Filter (batch#26466 >= cast(2023-12-21 10:00:00 as timestamp))
+- SubqueryAlias spark_catalog.default.timestamp_view
+- View (`spark_catalog`.`default`.`timestamp_view`, batch#26466)
+- Project cast(batch#26467 as timestamp) AS batch#26466
+- Project cast(batch#26463 as timestamp) AS batch#26467
+- SubqueryAlias spark_catalog.default.table_timestamp
+- Relation spark_catalog.default.table_timestampbatch#26463 parquet
== Optimized Logical Plan ==
Project cast(batch#26463 as timestamp) AS batch#26466
+- Filter (isnotnull(batch#26463) AND (cast(batch#26463 as timestamp) >= 2023-12-21 10:00:00))
+- Relation spark_catalog.default.table_timestampbatch#26463 parquet
```
The predicate compares a timestamp_ntz column with a literal value. As the column is wrapped in a cast expression to timestamp type, the literal (string) is wrapped with a cast to timestamp type. The literal with cast is foldable so it is evaluated to literal of timestamp earlier. So the predicate becomes `cast(batch#26463 as timestamp) >= 2023-12-21 10:00:00`. As the cast is in column side, it cannot be pushed down to data source/table.
Attachments
Issue Links
- links to