Description
Spark SQL today supports the TIMESTAMP data type. However the semantics provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. Timestamps embedded in a SQL query or passed through JDBC are presumed to be in session local timezone and cast to UTC before being processed.
These are desirable semantics in many cases, such as when dealing with calendars.
In many (more) other cases, such as when dealing with log files it is desirable that the provided timestamps not be altered.
SQL users expect that they can model either behavior and do so by using TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH LOCAL TIME ZONE for time zone sensitive data.
Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist in the standard.
In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for standard semantic.
Using these two types will provide clarity.
We will also allow users to set the default behavior for TIMESTAMP to either use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE.
Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type TimestampWithoutTZ meets or exceeds all function of the existing SQL Timestamp):
- Add a new DataType implementation for TimestampWithoutTZ.
- Support TimestampWithoutTZ in Dataset/UDF.
- TimestampWithoutTZ literals
- TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, TimestampWithoutTZ - Date)
- Datetime functions/operators: dayofweek, weekofyear, year, etc
- Cast to and from TimestampWithoutTZ, cast String/Timestamp to TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty printing)/Timestamp, with the SQL syntax to specify the types
- Support sorting TimestampWithoutTZ.
Milestone 2 – Persistence:
- Ability to create tables of type TimestampWithoutTZ
- Ability to write to common file formats such as Parquet and JSON.
- INSERT, SELECT, UPDATE, MERGE
- Discovery
Milestone 3 – Client support
- JDBC support
- Hive Thrift server
Milestone 4 – PySpark and Spark R integration
- Python UDF can take and return TimestampWithoutTZ
- DataFrame support
Attachments
Issue Links
- is duplicated by
-
SPARK-28955 Support for LocalDateTime semantics
- Resolved