[SPARK-27790] Support ANSI SQL INTERVAL types - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.3.0
Component/s: SQL
Labels:
- release-notes

Description

Spark has an INTERVAL data type, but it is “broken”:

It cannot be persisted
It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type.

ANSI describes two non overlapping “classes”:
- YEAR-MONTH,
- DAY-SECOND ranges
Members within each class can be compared and sorted.
Supports datetime arithmetic
Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings.

Milestone 1 – Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval):

Add two new DataType implementations for interval year-month and day-second. Includes the JSON format and DLL string.
Infra support: check the caller sides of DateType/TimestampType
Support the two new interval types in Dataset/UDF.
Interval literals (with a legacy config to still allow mixed year-month day-seconds fields and return legacy interval values)
Interval arithmetic(interval * num, interval / num, interval +/- interval)
Datetime functions/operators: Datetime - Datetime (to days or day second), Datetime +/- interval
Cast to and from the new two interval types, cast string to interval, cast interval to string (pretty printing), with the SQL syntax to specify the types
Support sorting intervals.

Milestone 2 – Persistence:

Ability to create tables of type interval
Ability to write to common file formats such as Parquet and JSON.
INSERT, SELECT, UPDATE, MERGE
Discovery

Milestone 3 – Client support

JDBC support
Hive Thrift server

Milestone 4 – PySpark and Spark R integration

Python UDF can take and return intervals
DataFrame support

Attachments

Issue Links

relates to

SPARK-9431 TimeIntervalType for for time intervals

Resolved

links to

[Github] Pull Request #31614 (MaxGekk)

Sub-Tasks

1.	Add ANSI SQL day-time and year-month interval types	Resolved	Max Gekk
2.	Support java.time.Duration as an external type of the day-time interval type	Resolved	Max Gekk
3.	Support java.time.Period as an external type of the year-month interval type	Resolved	Max Gekk
4.	Update the Spark SQL guide about day-time and year-month interval types	Resolved	Max Gekk
5.	Test year-month and day-time intervals in UDF	Resolved	Max Gekk
6.	Test DayTimeIntervalType/YearMonthIntervalType as ordered and atomic types	Resolved	Max Gekk
7.	Support casting of year-month intervals to strings	Resolved	Max Gekk
8.	Support casting of day-time intervals to strings	Resolved	Max Gekk
9.	Support add and subtract of ANSI SQL intervals	Resolved	Max Gekk
10.	Overflow in round trip conversion from micros to duration	Resolved	Max Gekk
11.	Add round trip tests for period <-> month and duration <-> micros	Resolved	Jiaan Geng
12.	Support ANSI SQL intervals by the aggregate function `sum`	Resolved	Jiaan Geng
13.	Assign pretty names to YearMonthIntervalType and DayTimeIntervalType	Resolved	Max Gekk
14.	Add an year-month interval to a date	Resolved	Max Gekk
15.	Add an year-month interval to a timestamp	Resolved	Max Gekk
16.	Add a day-time interval to a timestamp	Resolved	Max Gekk
17.	Prohibit saving of day-time and year-month intervals	Resolved	Max Gekk
18.	Multiply year-month interval by numeric	Resolved	Max Gekk
19.	Support ANSI SQL intervals by the aggregate function `avg`	Resolved	Jiaan Geng
20.	Push ANSI interval binary expressions into into (if / case) branches	Resolved	angerszhu
21.	Multiply day-time interval by numeric	Resolved	Max Gekk
22.	Divide year-month interval by numeric	Resolved	Max Gekk
23.	Divide day-time interval by numeric	Resolved	Max Gekk
24.	Test actual size of year-month and day-time intervals	Resolved	PengLei
25.	Hive inspect support DayTimeIntervalType and YearMonthIntervalType	Resolved	angerszhu
26.	Return day-time interval from dates subtraction	Resolved	Max Gekk
27.	Support ANSI intervals by date_part()	Resolved	Kent Yao 2
28.	Support cast long to DayTimeIntervalType and cast Int to YearMonthIntervalType	Resolved	Unassigned
29.	Return day-time interval from timestamps subtraction	Resolved	Max Gekk
30.	Enable ANSI intervals in SQLQueryTestSuite	Resolved	Max Gekk
31.	ANSI intervals formatting in hive results	Resolved	Max Gekk
32.	Format ANSI intervals in Hive style	Resolved	Max Gekk
33.	Transfer ANSI intervals via Hive Thrift server	Resolved	Max Gekk
34.	Construct year-month interval column from integral fields	Resolved	angerszhu
35.	Test transferring year-month interval via Hive Thrift server	Resolved	Max Gekk
36.	Cast string to year-month interval	Resolved	angerszhu
37.	Recognize sign before the interval string in literals	Resolved	Max Gekk
38.	Add a day-time interval to a date	Resolved	Max Gekk
39.	Parse interval literals as ANSI intervals	Resolved	Max Gekk
40.	Get columns operation should handle ANSI interval column properly	Resolved	Jiaan Geng
41.	Accept ANSI intervals by the Sequence expression	Resolved	Jiaan Geng
42.	Extract a field from ANSI interval	Resolved	Kent Yao 2
43.	IntervalUtils.fromYearMonthString can't handle Int.MinValue correctly	Resolved	angerszhu
44.	Use ANSI intervals in streaming join tests	Resolved	Kousuke Saruta
45.	Convert ANSI interval literals to SQL string	Resolved	Max Gekk
46.	Parse unit-to-unit interval literals to ANSI intervals	Resolved	Max Gekk
47.	Handle ANSI intervals in WindowExecBase	Resolved	Jiaan Geng
48.	Cast string to day-time interval	Resolved	angerszhu
49.	Support ANSI intervals in the Hash expression	Resolved	angerszhu
50.	Test ANSI interval literals	Resolved	Max Gekk
51.	Test ANSI intervals in MutableProjectionSuite	Resolved	Max Gekk
52.	The generated data fits the precision of DayTimeIntervalType in spark	Resolved	Jiaan Geng
53.	Add tests for ANSI intervals to HiveThriftBinaryServerSuite	Resolved	angerszhu
54.	Construct day-time interval column from integral fields	Resolved	Max Gekk
55.	Support ANSI intervals as Arrow Column vectors	Resolved	PengLei
56.	Override `sql()` of ANSI interval operators	Resolved	Max Gekk
57.	Wrong result of min ANSI interval division by -1	Resolved	angerszhu
58.	Failure on minimal interval literal	Resolved	angerszhu
59.	Support columnar execution on ANSI interval types	Resolved	Peng Lei
60.	Parse ANSI interval types in SQL	Resolved	Max Gekk
61.	Support fields by year-month interval type	Resolved	Max Gekk
62.	Support fields by the day-time interval type	Resolved	Max Gekk
63.	Truncate java.time.Duration by fields of day-time interval type	Resolved	angerszhu
64.	Return INTERVAL DAY from dates subtraction	Resolved	PengLei
65.	Check multiply/divide of day-time intervals of any fields by numeric	Resolved	PengLei
66.	Check all day-time interval types in aggregate expressions	Resolved	Kousuke Saruta
67.	Check all day-time interval types in UDF	Resolved	angerszhu
68.	Check all day-time interval types in arrow	Resolved	angerszhu
69.	Parse DayTimeIntervalType from JSON	Resolved	angerszhu
70.	Check all day-time interval types in HiveInspectors tests	Resolved	angerszhu
71.	Format day-time intervals using type fields	Resolved	Kousuke Saruta
72.	Take into account day-time interval fields in cast	Resolved	angerszhu
73.	Parse any day-time interval types in SQL	Resolved	Kousuke Saruta
74.	Parse day-time interval literals to tightest types	Resolved	Kousuke Saruta
75.	Parse unit list interval literals as year-month/day-time interval types	Resolved	Kousuke Saruta
76.	Take into account year-month interval fields in cast	Resolved	angerszhu
77.	Truncate java.time.Period by fields of year-month interval type	Resolved	angerszhu
78.	Parse YearMonthIntervalType from JSON	Resolved	angerszhu
79.	Format year-month intervals using type fields	Resolved	Kousuke Saruta
80.	Check all year-month interval types in HiveInspectors tests	Resolved	angerszhu
81.	Parse year-month interval literals to tightest types	Resolved	Kousuke Saruta
82.	Parse any year-month interval types in SQL	Resolved	Kousuke Saruta
83.	Check all year-month interval types in aggregate expressions	Resolved	Kousuke Saruta
84.	Check all year-month interval types in arrow	Resolved	Apache Spark
85.	Check all year-month interval types in UDF	Resolved	angerszhu
86.	Check multiply/divide of year-month intervals of any fields by numeric	Resolved	PengLei
87.	Allow delayThreshold for watermark to be represented as ANSI day-time/year-month interval literals	Resolved	Kousuke Saruta
88.	Support cast between different field YearMonthIntervalType	Resolved	angerszhu
89.	Support cast between different DayTimeIntervalType	Resolved	angerszhu
90.	Show proper error message when update column types to year-month/day-time interval	Resolved	Apache Spark
91.	Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`	Resolved	Max Gekk
92.	Improve the implementation for DateType +/- DayTimeIntervalType(DAY)	Resolved	PengLei
93.	Move new interval type test cases from CastSuite to CastBaseSuite	Resolved	Gengliang Wang
94.	Support upcast between different field of YearMonthIntervalType/DayTimeIntervalType	Resolved	angerszhu
95.	Literal.create(value, dataType) should support fields	Resolved	angerszhu
96.	Unify IntervalUtils.castStringToYMInterval with parser	Open	Unassigned
97.	Unify IntervalUtils.castStringToDTInterval with parser	Open	Unassigned
98.	Support DayTimeIntervalType in width-bucket function	Resolved	PengLei
99.	Support YearMonthIntervalType in width-bucket function	Resolved	PengLei
100.	Allow from_json/to_json for map types where value types are year-month intervals	Resolved	Kousuke Saruta
101.	Allow from_json/to_json for map types where value types are day-time intervals	Resolved	Kousuke Saruta
102.	Make from_csv/to_csv to handle year-month intervals properly	Resolved	Kousuke Saruta
103.	Make from_csv/to_csv to handle day-time intervals properly	Resolved	Kousuke Saruta
104.	Incorrect parsing of the start field in interval literals	Resolved	angerszhu
105.	Respect interval fields in extract	Resolved	Kousuke Saruta
106.	Confusing error from casting a string to ANSI interval	Resolved	angerszhu
107.	Interval str should handle start field == end Field	Resolved	Unassigned
108.	Remove IntervalUnit in code	Resolved	angerszhu
109.	Change quoted interval literal (interval constructor) to be converted to ANSI interval types	Resolved	Kousuke Saruta
110.	SparkScriptTransformation should support ANSI interval types	Resolved	Kousuke Saruta
111.	Step by days in the Sequence expression for dates	Resolved	Jiaan Geng
112.	Update docs about mapping of ANSI interval types to Java/Scala/SQL types	Resolved	Max Gekk
113.	Support ANSI interval literals for TimeWindow	Resolved	Kousuke Saruta
114.	Disallow ANSI intervals in file-based datasources	Resolved	Max Gekk
115.	Support comparison of ANSI intervals with different fields	Resolved	angerszhu
116.	Update docs about ANSI interval literals	Resolved	Max Gekk
117.	Support cast type constructed string to year month interval	In Progress	Unassigned
118.	Support type constructed string as dat time interval	In Progress	Unassigned
119.	Support Interval add/subtract NULL	Resolved	Gengliang Wang
120.	Test Interval multiply / divide null	Resolved	Gengliang Wang
121.	Disallow comparison between Interval and String	Resolved	Gengliang Wang
122.	Add common class/trait for ANSI interval types	Resolved	Max Gekk
123.	DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero.	Resolved	Jiaan Geng
124.	DivideDTInterval should throw the same exception when divide by zero.	Resolved	Unassigned
125.	day-time interval types should respect daylight saving time correctly	Open	Unassigned
126.	Merge ANSI interval types to a tightest common type	Resolved	Max Gekk
127.	Read/write dataframes with ANSI intervals from/to parquet files	Resolved	Max Gekk
128.	Read/write dataframes with ANSI intervals from/to JSON files	Resolved	Kousuke Saruta
129.	Read/write dataframes with ANSI intervals from/to CSV files	Resolved	Kousuke Saruta
130.	Incorrect parsing of negative ANSI typed interval literals	Resolved	Peng Lei
131.	Test ANSI interval support by the Parquet datasource	Resolved	Max Gekk
132.	Parquet reader fails on load of ANSI interval when off-heap is enabled	Resolved	Max Gekk
133.	Pushdown filters with ANSI interval values to parquet	Resolved	Max Gekk
134.	Support ANSI intervals by ABS	Resolved	Max Gekk
135.	The DIV function should support ANSI intervals	Resolved	PengLei
136.	The SIGN/SIGNUM functions should support ANSI intervals	Resolved	PengLei
137.	Allow coercing of an interval expression to a specific interval type	Open	Unassigned
138.	CAST between ANSI intervals and numerics	Resolved	PengLei
139.	Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray	Resolved	PengLei
140.	Read/write dataframes with ANSI intervals from/to ORC files	Resolved	Kousuke Saruta
141.	Check saving of a dataframe with ANSI intervals to a Hive parquet table	Resolved	Apache Spark
142.	Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet	Resolved	Max Gekk
143.	Fix CREATE TABLE AS SELECT of ANSI intervals	Resolved	Max Gekk
144.	Pushdown filters with ANSI interval values to ORC	Resolved	Kousuke Saruta
145.	Support AnsiInterval radix sort	Resolved	XiDuo You
146.	Support ANSI Interval in functions that support numeric type	Resolved	angerszhu
147.	RowToColumnConverter support AnsiIntervalType	Resolved	PengLei
148.	Read/write dataframes with ANSI intervals from/to Avro files	Resolved	Max Gekk
149.	Dynamic writes/reads of ANSI interval partitions	Resolved	Max Gekk
150.	Cannot read partitioned parquet files with ANSI interval partition values	Resolved	Max Gekk
151.	Check adding partitions with ANSI intervals	Resolved	Max Gekk
152.	Check inserting of ANSI intervals into a table partitioned by the interval columns	Resolved	Max Gekk
153.	Check replacing columns with ANSI intervals	Resolved	Max Gekk
154.	Check adding of ANSI interval columns to v1/v2 tables	Resolved	Max Gekk
155.	Support casting integrals to intervals in ANSI mode	Resolved	Max Gekk
156.	Support casting intervals to integrals in ANSI mode	Resolved	Max Gekk
157.	Support cast of ANSI intervals to decimals	Resolved	Max Gekk
158.	Support cast of decimals to ANSI intervals	Resolved	Max Gekk
159.	Document cast of ANSI intervals	Resolved	Max Gekk

Activity

People

Assignee:: Max Gekk

Reporter:: Max Gekk

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 21/May/19 11:46

Updated:: 22/Mar/22 06:13

Resolved:: 22/Mar/22 06:13