Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27638

date format yyyy-M-dd string comparison not handled properly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.2
    • 3.0.0
    • SQL
    • None

    Description

      The below example works with both Mysql and Hive, however not with spark.

      mysql> select * from date_test where date_col >= '2000-1-1';
      +------------+
      | date_col   |
      +------------+
      | 2000-01-01 |
      +------------+
      

      The reason is that Spark casts both sides to String type during date and string comparison for partial date support. Please find more details in https://issues.apache.org/jira/browse/SPARK-8420.

      Based on some tests, the behavior of Date and String comparison in Hive and Mysql:
      Hive: Cast to Date, partial date is not supported
      Mysql: Cast to Date, certain "partial date" is supported by defining certain date string parse rules. Check out str_to_datetime in https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c

      Here's 2 proposals:
      a. Follow Mysql parse rule, but some partial date string comparison cases won't be supported either.
      b. Cast String value to Date, if it passes use date.toString, original string otherwise.

      Attachments

        Issue Links

          Activity

            People

              pengbo peng bo
              pengbo peng bo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: