Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33404

"date_trunc" expression returns incorrect results

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 3.0.1, 3.1.0
    • Fix Version/s: 3.0.2, 3.1.0
    • Component/s: SQL
    • Labels:

      Description

      `date_trunc` SQL expression returns incorrect results for minute formatting string.

      Context: The minute formatting string should truncate the timestamps such that the seconds is set to ZERO.

      Repro (run the following commands in spark-shell):

      spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
      spark.sql("SELECT date_trunc('minute', '1769-10-17 17:10:02')").show()

      Spark currently incorrectly returns

      1769-10-17 17:10:02

      against the expected return value of

      1769-10-17 17:10:00

      This happens as truncTimestamp in package org.apache.spark.sql.catalyst.util.DateTimeUtils incorrectly assumes that time zone offsets can never have the granularity of a second and thus does not account for time zone adjustment when truncating the timestamp to minute.
      This assumption is currently used when truncating the timestamps to microsecond, millisecond, second, or minute.

        Attachments

          Activity

            People

            • Assignee:
              utkarsh39 Utkarsh Agarwal
              Reporter:
              utkarsh39 Utkarsh Agarwal

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment