Details
-
Epic
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0.0
-
None
-
None
-
Collation support
Description
This feature will introduce collation support to the Spark engine. This means that:
- Every StringType will have an associated collation. Default remains UTF8 Binary, which will behave under the same rules as current UTF8 String comparison.
- Collation will be respected in all collation sensitive operations - comparisons, hashing, string operations (contains, startWith, endsWith etc.)
- Collation can be set through following ways:
- COLLATE expression. e.g. strExpr COLLATE collation_name
- In CREATE TABLE column definition
- By setting session collation.
- All the Spark operators need to respect collation settings (filters, joins, shuffles, aggs etc.)
This is a high level description of the feature. You can find detailed design under this link (doc is in attachment as well).