[SPARK-27790] Support ANSI SQL INTERVAL types - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.3.0
Component/s: SQL
Labels:
- release-notes

Description

Spark has an INTERVAL data type, but it is “broken”:

It cannot be persisted
It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type.

ANSI describes two non overlapping “classes”:
- YEAR-MONTH,
- DAY-SECOND ranges
Members within each class can be compared and sorted.
Supports datetime arithmetic
Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings.

Milestone 1 – Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval):

Add two new DataType implementations for interval year-month and day-second. Includes the JSON format and DLL string.
Infra support: check the caller sides of DateType/TimestampType
Support the two new interval types in Dataset/UDF.
Interval literals (with a legacy config to still allow mixed year-month day-seconds fields and return legacy interval values)
Interval arithmetic(interval * num, interval / num, interval +/- interval)
Datetime functions/operators: Datetime - Datetime (to days or day second), Datetime +/- interval
Cast to and from the new two interval types, cast string to interval, cast interval to string (pretty printing), with the SQL syntax to specify the types
Support sorting intervals.

Milestone 2 – Persistence:

Ability to create tables of type interval
Ability to write to common file formats such as Parquet and JSON.
INSERT, SELECT, UPDATE, MERGE
Discovery

Milestone 3 – Client support

JDBC support
Hive Thrift server

Milestone 4 – PySpark and Spark R integration

Python UDF can take and return intervals
DataFrame support

Attachments

Issue Links

relates to

SPARK-9431 TimeIntervalType for for time intervals

Resolved

links to

[Github] Pull Request #31614 (MaxGekk)

Sub-Tasks

1.	Unify IntervalUtils.castStringToYMInterval with parser	Open	Unassigned
2.	Unify IntervalUtils.castStringToDTInterval with parser	Open	Unassigned
3.	Support cast type constructed string to year month interval	In Progress	Unassigned
4.	Support type constructed string as dat time interval	In Progress	Unassigned
5.	day-time interval types should respect daylight saving time correctly	Open	Unassigned
6.	Allow coercing of an interval expression to a specific interval type	Open	Unassigned

Activity

People

Assignee:: Max Gekk

Reporter:: Max Gekk

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 21/May/19 11:46

Updated:: 22/Mar/22 06:13

Resolved:: 22/Mar/22 06:13