Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46841

Language support for collations

    XMLWordPrintableJSON

Details

    Description

      Languages and localization for collations are supported by ICU library. Collation naming format is as follows:

      <2-letter language code>[_<4-letter script>][_<3-letter country code>][_specifier_specifier...]

      Locale specifier consists of the first part of collation name (language + script + country). Locale specifiers need to be stable across ICU versions; to keep existing ids and names invariant we introduce golden file will locale table which should case CI failure on any silent changes.

      Currently supported optional specifiers:

      • CS/CI - case sensitivity, default is case-sensitive; supported by configuring ICU collation levels
      • AS/AI - accent sensitivity; default is accent-sensitive; supported by configuring ICU collation levels
      • <unspecified>/LCASE/UCASE - case conversion performed prior to comparisons; supported by internal implementation relying on ICU locale-aware conversions

      User can use collation specifiers in any order except of locale which is mandatory and must go first. There is a one-to-one mapping between collation ids and collation names defined in CollationFactory.

      Attachments

        Issue Links

          Activity

            People

              nikolamand-db Nikola Mandic
              dbatomic Aleksandar Tomic
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: