Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46837 String function support (parent)
  3. SPARK-47353

Mode expression for strings (all collations)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      Enable collation support for the Mode expression in Spark. First confirm what is the expected behaviour for this expression when given collated strings, then move on to the implementation that would enable handling strings of all collation types. Implement the corresponding unit tests and E2E SQL tests to reflect how this function should be used with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with the existing functions to learn more about how they work. In addition, look into the possible use-cases and implementation of similar functions within other other open-source DBMS, such as PostgreSQL.

       

      The goal for this Jira ticket is to implement the Mode expression so it supports all collation types currently supported in Spark. To understand what changes were introduced in order to enable full collation support for other existing functions in Spark, take a look at the Spark PRs and Jira tickets for completed tasks in this parent (for example: Contains, StartsWith, EndsWith).

      Examples:

      With UTF8_BINARY collation, the query
      SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) AS tab(col);
      should return 'a'.

      With UTF8_BINARY_LCASE collation, the query
      SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) AS tab(col);
      should return either 'B' or 'b'.

       

      Read more about ICU Collation Concepts and Collator class. Also, refer to the Unicode Technical Standard for collation.

      Attachments

        Activity

          People

            gpgp Gideon P
            uros-db Uroš Bojanić
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: