Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47617

Add TPC-DS testing infrastructure for collations

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      As collation support grows across all SQL features and new collation types are added, we need to have reliable testing model covering as many standard SQL capabilities as possible.

      We can utilize TPC-DS testing infrastructure already present in Spark. The idea is to vary TPC-DS table string columns by adding multiple collations with different ordering rules and case sensitivity, producing new tables. These tables should yield the same results against predefined TPC-DS queries for certain batches of collations. For example, when comparing query runs on table where columns are first collated as UTF8_BINARY and then as UTF8_BINARY_LCASE, we should be getting same results after converting to lowercase.

      Introduce new query suite which tests the described behavior with available collations (utf8_binary and unicode) combined with case conversions (lowercase, uppercase, randomized case for fuzzy testing).

      Attachments

        Issue Links

          Activity

            People

              nikolamand-db Nikola Mandic
              nikolamand-db Nikola Mandic
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: