[SPARK-24469] Support collations in Spark SQL - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

One of our use cases is to support case-insensitive comparison in operations, including aggregation and text comparison filters. Another use case is to sort via collator. Support for collations throughout the query processor appear to be the proper way to support these needs.

Language-based worked arounds (for the aggregation case) are insufficient:

SELECT UPPER(text)....GROUP BY UPPER(text)
introduces invalid values into the output set
SELECT MIN(text)...GROUP BY UPPER(text)
results in poor performance in our case, in part due to use of sort-based aggregate

Examples of collation support in RDBMS:

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Alexander Shkapsky

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 05/Jun/18 17:02

Updated:: 08/Oct/19 05:44

Resolved:: 08/Oct/19 05:44