Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46830

Introducing collation concept into Spark

    XMLWordPrintableJSON

Details

    • Epic
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Spark Core
    • None
    • Collation support

    Description

      This feature will introduce collation support to the Spark engine. This means that:

       

      1. Every StringType will have an associated collation. Default remains UTF8 Binary, which will behave under the same rules as current UTF8 String comparison.
      2. Collation will be respected in all collation sensitive operations - comparisons, hashing, string operations (contains, startWith, endsWith etc.)
      3. Collation can be set through following ways:
        1. COLLATE expression. e.g. strExpr COLLATE collation_name
        2. In CREATE TABLE column definition
        3. By setting session collation.
      4. All the Spark operators need to respect collation settings (filters, joins, shuffles, aggs etc.)

       

      This is a high level description of the feature. You can find detailed design under this link (doc is in attachment as well).

       

      Attachments

        1. Collation Support in Spark.docx
          1.18 MB
          Aleksandar Tomic

        Activity

          People

            Unassigned Unassigned
            dbatomic Aleksandar Tomic
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: