[SPARK-34246] New type coercion syntax rules in ANSI mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.2
Fix Version/s: 3.2.0
Component/s: SQL
Labels:
None

Target Version/s:

3.2.0
Epic Link:
ANSI SQL compliance

Description

Add new implicit cast syntax rules in ANSI mode.
In Spark ANSI mode, the type coercion rules are based on the type precedence lists of the input data types.
As per the section "Type precedence list determination" of "ISO/IEC 9075-2:2011
Information technology — Database languages - SQL — Part 2: Foundation (SQL/Foundation)", the type precedence lists of primitive
data types are as following:

Byte: Byte, Short, Int, Long, Decimal, Float, Double
Short: Short, Int, Long, Decimal, Float, Double
Int: Int, Long, Decimal, Float, Double
Long: Long, Decimal, Float, Double
Decimal: Any wider Numeric type
Float: Float, Double
Double: Double
String: String
Date: Date, Timestamp
Timestamp: Timestamp
Binary: Binary
Boolean: Boolean
Interval: Interval
As for complex data types, Spark will determine the precedent list recursively based on their sub-types.

With the definition of type precedent list, the general type coercion rules are as following:

Data type S is allowed to be implicitly cast as type T iff T is in the precedence list of S
Comparison is allowed iff the data type precedence list of both sides has at least one common element. When evaluating the comparison, Spark casts both sides as the tightest common data type of their precedent lists.
There should be at least one common data type among all the children's precedence lists for the following operators. The data type of the operator is the tightest common precedent data type.

In
Except(odd)
Intersect
Greatest
Least
Union
If
CaseWhen
CreateArray
Array Concat
Sequence
MapConcat
CreateMap

For complex types (struct, array, map), Spark recursively looks into the element type and applies the rules above. If the element nullability is converted from true to false, add runtime null check to the elements.

Attachments

Issue Links

is related to

SPARK-38860 ANSI enhancements in Spark 3.3

Open

relates to

SPARK-35030 ANSI SQL compliance

Resolved

links to

[Github] Pull Request #31349 (gengliangwang)

[Github] Pull Request #32493 (gengliangwang)

Sub-Tasks

1.	Allow implicit casting string literal to other data types under ANSI mode	Resolved	Gengliang Wang
2.	Add documentation for ANSI implicit cast rules	Resolved	Gengliang Wang
3.	Add rule WindowFrameCoercion into ANSI implicit cast rules	Resolved	Gengliang Wang
4.	Clean up AnsiTypeCoercionSuite and TypeCoercionSuite	Resolved	Unassigned
5.	AnsiTypeCoercion: return narrowest convertible type among TypeCollection	Resolved	Gengliang Wang
6.	Extracting date field from timestamp should work in ANSI mode	Resolved	Gengliang Wang
7.	ANSI type coercion rule for date time operations	Resolved	Gengliang Wang
8.	Disallow binary operations between Interval and String literal	Resolved	Gengliang Wang
9.	ANSI mode: Use store assignment rules for resolving function invocation	Resolved	Gengliang Wang
10.	Show hint if analyzer fails due to ANSI type coercion	Resolved	Gengliang Wang
11.	ANSI mode: allow implicitly casting String to other simple types	Resolved	Gengliang Wang

Activity

People

Assignee:: Gengliang Wang

Reporter:: Gengliang Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Jan/21 15:30

Updated:: 10/Aug/24 21:01

Resolved:: 24/Feb/21 05:41