Description
The below example works with both Mysql and Hive, however not with spark.
mysql> select * from date_test where date_col >= '2000-1-1';
+------------+
| date_col |
+------------+
| 2000-01-01 |
+------------+
The reason is that Spark casts both sides to String type during date and string comparison for partial date support. Please find more details in https://issues.apache.org/jira/browse/SPARK-8420.
Based on some tests, the behavior of Date and String comparison in Hive and Mysql:
Hive: Cast to Date, partial date is not supported
Mysql: Cast to Date, certain "partial date" is supported by defining certain date string parse rules. Check out str_to_datetime in https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c
Here's 2 proposals:
a. Follow Mysql parse rule, but some partial date string comparison cases won't be supported either.
b. Cast String value to Date, if it passes use date.toString, original string otherwise.
Attachments
Issue Links
- is related to
-
SPARK-40610 Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause
- Resolved
- links to