Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35662

Support Timestamp without time zone data type

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL
    • None

    Description

      Spark SQL today supports the TIMESTAMP data type. However the semantics provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. Timestamps embedded in a SQL query or passed through JDBC are presumed to be in session local timezone and cast to UTC before being processed.
      These are desirable semantics in many cases, such as when dealing with calendars.
      In many (more) other cases, such as when dealing with log files it is desirable that the provided timestamps not be altered.
      SQL users expect that they can model either behavior and do so by using TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH LOCAL TIME ZONE for time zone sensitive data.
      Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist in the standard.

      In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for standard semantic.
      Using these two types will provide clarity.
      We will also allow users to set the default behavior for TIMESTAMP to either use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE.

      Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type TimestampWithoutTZ meets or exceeds all function of the existing SQL Timestamp):

      • Add a new DataType implementation for TimestampWithoutTZ.
      • Support TimestampWithoutTZ in Dataset/UDF.
      • TimestampWithoutTZ literals
      • TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, TimestampWithoutTZ - Date)
      • Datetime functions/operators: dayofweek, weekofyear, year, etc
      • Cast to and from TimestampWithoutTZ, cast String/Timestamp to TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty printing)/Timestamp, with the SQL syntax to specify the types
      • Support sorting TimestampWithoutTZ.

      Milestone 2 – Persistence:

      • Ability to create tables of type TimestampWithoutTZ
      • Ability to write to common file formats such as Parquet and JSON.
      • INSERT, SELECT, UPDATE, MERGE
      • Discovery

      Milestone 3 – Client support

      • JDBC support
      • Hive Thrift server

      Milestone 4 – PySpark and Spark R integration

      • Python UDF can take and return TimestampWithoutTZ
      • DataFrame support

      Attachments

        Issue Links

          Activity

            People

              apachespark Apache Spark
              Gengliang.Wang Gengliang Wang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: