Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24469

Support collations in Spark SQL

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • SQL

    Description

      One of our use cases is to support case-insensitive comparison in operations, including aggregation and text comparison filters.  Another use case is to sort via collator.  Support for collations throughout the query processor appear to be the proper way to support these needs.

      Language-based worked arounds (for the aggregation case) are insufficient:

      1. SELECT UPPER(text)....GROUP BY UPPER(text)
        introduces invalid values into the output set
      2. SELECT MIN(text)...GROUP BY UPPER(text) 
        results in poor performance in our case, in part due to use of sort-based aggregate

      Examples of collation support in RDBMS:

      Attachments

        Activity

          People

            Unassigned Unassigned
            ashkapsky Alexander Shkapsky
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: