Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27772

UNIX_TIMESTAMP should return NULL when date fields are out of bounds

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0
    • None

    Description

      For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is giving out the timestamp value as the last valid date, rather than NULL. (e.g. UNIX_TIMESTAMP('2001-02-31', 'yyyy-MM-dd') gives 983354400, which converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.

      In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 0).

       

      6: jdbc:hive2://localhost:10001/> select month, datetimestamp, unix_timestamp(datetimestamp, 'yyyy-MM-dd') as timestampCol from datetimetable;
      INFO  : Compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, 'yyyy-MM-dd') as timestampCol from datetimetable
      INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
      INFO  : Semantic Analysis Completed (retrial = false)
      INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, type:string, comment:null), FieldSchema(name:datetimestamp, type:string, comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], properties:null)
      INFO  : Completed compiling command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.102 seconds
      INFO  : Operation QUERY obtained 0 locks
      INFO  : Executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): select month, datetimestamp, unix_timestamp(datetimestamp, 'yyyy-MM-dd') as timestampCol from datetimetable
      INFO  : Completed executing command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); Time taken: 0.0 seconds
      +--------+----------------+---------------+
      | month  | datetimestamp  | timestampcol  |
      +--------+----------------+---------------+
      | Feb    | 2001-02-28     | 983318400     |
      | Feb    | 2001-02-29     | 983318400     |
      | Feb    | 2001-02-30     | 983318400     |
      | Feb    | 2001-02-31     | 983318400     |
      | Feb    | 2001-02-32     | NULL          |
      +--------+----------------+---------------+
      5 rows selected (0.131 seconds)

       

       

       

      According to java jdk : 
      https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103

       

        /**
           * Style to resolve dates and times strictly.
           * <p>
           * Using strict resolution will ensure that all parsed values are within
           * the outer range of valid values for the field. Individual fields may
           * be further processed for strictness.
           * <p>
           * For example, resolving year-month and day-of-month in the ISO calendar
           * system using strict mode will ensure that the day-of-month is valid
           * for the year-month, rejecting invalid values.
           */
          STRICT,
          /**
           * Style to resolve dates and times in a smart, or intelligent, manner.
           * <p>
           * Using smart resolution will perform the sensible default for each
           * field, which may be the same as strict, the same as lenient, or a third
           * behavior. Individual fields will interpret this differently.
           * <p>
           * For example, resolving year-month and day-of-month in the ISO calendar
           * system using smart mode will ensure that the day-of-month is from
           * 1 to 31, converting any value beyond the last valid day-of-month to be
           * the last valid day-of-month.
           */
          SMART,

       

       

      By default, the DATETIME formatter uses the SMART resolution style and the SIMPLE formatter the LENIENT. Both of these styles are able to resolve "invalid" bounds to valid dates. In order to prevent seemingly "invalid" dates to be parsed correctly we have to use the STRICT resolution style. However, we cannot simply switch the formatters to always use the STRICT resolution cause that would break existing applications relying on the existing resolution rules. To address the problem reported here and retain the previous behaviour we opted to make the resolution style configurable by adding a new property. The new property only affects the DATETIME formatter; the SIMPLE formatter is almost deprecated so we don't add new features to it.

       

      Attachments

        Issue Links

          Activity

            People

              simhadri-g Simhadri Govindappa
              simhadri-g Simhadri Govindappa
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: