Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46837

String function support (parent)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Spark Core
    • None

    Description

      TODO: List of all functions that need to be updated for collation support:

       

      Feature/function Priority Type
      Shuffle 0 comparison
      Delta Columns 0 storage
      Partition key 0 storage
      Comparison operators 0 comparison
      IN list 0 comparison
      GROUP BY  0 comparison
      MERGE, HASH joins 0 comparison
      ORDER BY  0 sorting
      Aggregation 0 comparison
      like 0 comparison
      regexp_* 0 matching
      concat 0 Pass through
      substr 0 Pass through
      between 0 comparison
      coalesce  0 Pass through
      Is distinct 0 comparison
      trim 0 Pass through
      instr 0 comparison
      lcase 0 Pass through, modify
      lower 0 Pass through modify
      replace 0 comparison
      ucase 0 modify , pass through
      upper 0 Modify, pass through
      count(distinct ) 0 comparison
      min/max 0 Comparison pass through
      array 0 Pass through
      case 0 Pass through
      decode 0 Pass through
      elt 0 Comparison, pass through
      Nullif, nvl, nvl2 0 Pass through, comparison
           
      Session variables 1 storage
      SQL UDF 1 Storage, pass through
      Python UDF 1 Storage
      Array element 1 storage
      Map key 1 Storage, comparison
      Map value 1 Storage
      Struct field 1 storage
      least/greatest 1 Comparison, pass through
      if/iff/ifnull 1 Pass through, comparison
      mapExpr [ keyExpr ] 1 Comparison, pass through
      concat_ws 1 Pass through
      contains 1 comparison
      left 1 Pass through
      *pad 1 Pass through
      repeat 1 Pass through
      reverse 1 Pass through
      translate 1 Comparison, Pass through
      array_agg 1 Pass through
      first/last/any 1 Pass through
      mode 1 Comparison, pass through
      array_* 1 Pass through, dedup (array distinct)
           
      explode 2 Pass through
      filter 2 Pass through
      flatten 2 Pass through
      inline* 2 Pass through
      reduce 2 Pass through
      reverse 2 Pass through
      shuffle 2 Pass through
      Slice 2 Pass through
      sort_array 2 Comparison, pass through
      transform 2 Pass through
      zip* 2 Pass through
      map 2 Pass through
      map_* 2 Pass through
      str_to_map 2 Comparison, pass through
      transform* 2 Pass through
      stack 2 Pass through
      describe 2 display
      ilike 2 matching
      charindex 2 comparison
      endswith 2 comparison
      startswith 2 comparison
      find_in_set 2 comparison
      initcap 2 Pass through, modify
      locate 2 comparison
      mask 2 Pass through
      overlay 2 Pass through
      position 2 comparison
      sentences 2 Comparison, pass through
      split 2 Comparison, pass through
      split_part 2 Comparison, pass through
      collect_list 2 Pass through
      collect_set 2 Pass through
      min_by/max_by 2 Comparison, pass through
      Element_at, [] 2 Pass through
      aggregate 2 Pass through

      Attachments

        1.
        contains, startswith, endswith (binary & lowercase collation only) Sub-task Resolved Uroš Bojanić
        2.
        contains (all collations) Sub-task Resolved Uroš Bojanić
        3.
        startswith, endswith (all collations) Sub-task Resolved Stevo Mitric
        4.
        new test suite for UTF8String Sub-task Resolved Uroš Bojanić
        5.
        fail all unsupported functions Sub-task Resolved Uroš Bojanić
        6.
        Resolve AbstractDataType simpleStrings for StringTypeCollated Sub-task Resolved Mihailo Milosevic
        7.
        refactor UTF8String and CollationFactory Sub-task Resolved Uroš Bojanić
        8.
        Fix CollationSupport test output Sub-task Resolved Unassigned
        9.
        endsWith and startsWith don't work correctly for some collations Sub-task Resolved Vladimir Golubev
        10.
        Add benchmark for stringpredicate expressions Sub-task Resolved Uroš Bojanić
        11.
        Optimize string predicate expressions for UTF8_BINARY_LCASE collation Sub-task Resolved Uroš Bojanić
        12.
        Regexp expressions (binary & lowercase collation only) Sub-task Resolved Uroš Bojanić
        13.
        Add support for ConcatWs & Elt (all collations) Sub-task Resolved Mihailo Milosevic
        14.
        Add support for Upper, Lower, InitCap (all collations) Sub-task Resolved Mihailo Milosevic
        15.
        Fix Upper, Lower, InitCap collation awareness Sub-task Resolved Uroš Bojanić
        16.
        StringRepeat (all collations) Sub-task Resolved Milan Dankovic
        17.
        StringReplace (all collations) Sub-task Resolved Uroš Bojanić
        18.
        Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck (all collations) Sub-task Resolved Nikola Mandic
        19.
        StringTranslate (all collations) Sub-task Resolved Milan Dankovic
        20.
        StringTrim & StringTrimLeft/Right/Both (binary & lowercase collation only) Sub-task Resolved David Milicevic
        21.
        StringInstr, FindInSet (all collations) Sub-task Resolved Milan Dankovic
        22.
        StringLPad, StringRPad (all collations) Sub-task Resolved Gideon P
        23.
        Substring, Right, Left (all collations) Sub-task Resolved Gideon P
        24.
        Levenshtein (all collations) Sub-task Open Unassigned
        25.
        When the collationId is invalid, throw `COLLATION_INVALID_ID` Sub-task Open Unassigned
        26.
        Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations) Sub-task Resolved Nikola Mandic
        27.
        SplitPart (binary & lowercase collation only) Sub-task Resolved Uroš Bojanić
        28.
        Add Collation Support for trim/ltrim/rtrim Sub-task Resolved Unassigned
        29.
        Mode (all collations) Sub-task Open Unassigned
        30.
        StringToMap & Mask (all collations) Sub-task Resolved Uroš Bojanić
        31.
        Fix mathExpressions that use StringType Sub-task Resolved Mihailo Milosevic
        32.
        Use wildcard imports in CollationTypeCasts Sub-task Resolved Unassigned
        33.
        Format expressions (all collations) Sub-task Resolved Uroš Bojanić
        34.
        Variant expressions (all collations) Sub-task Resolved Uroš Bojanić
        35.
        Add support for AbstractMapType Sub-task Resolved Uroš Bojanić
        36.
        URL expressions (all collations) Sub-task Resolved Uroš Bojanić
        37.
        Miscellaneous expressions (all collations) Sub-task Resolved Uroš Bojanić
        38.
        CurrentLike - Database/Schema, Catalog, User (all collations) Sub-task Open Unassigned
        39.
        JSON expressions (all collations) Sub-task Resolved Uroš Bojanić
        40.
        CSV expressions (all collations) Sub-task Resolved Uroš Bojanić
        41.
        XML expressions (all collations) Sub-task Resolved Uroš Bojanić
        42.
        XPath expressions (all collations) Sub-task Resolved Uroš Bojanić
        43.
        inputFile expressions (all collations) Sub-task Resolved Uroš Bojanić
        44.
        DateFormatClass (all collations) Sub-task Open Unassigned
        45.
        Datetime expressions (all collations) Sub-task Open Unassigned
        46.
        Alter logic for: startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE) Sub-task Open Unassigned
        47.
        Alter logic for: instr, substring_index (UTF8_BINARY_LCASE) Sub-task Open Unassigned
        48.
        Alter logic for: find_in_set, replace (UTF8_BINARY_LCASE) Sub-task Open Unassigned
        49.
        Implement modified Lowercase operation for UTF8_BINARY_LCASE Sub-task Open Unassigned
        50.
        Add Expression Walker for Testing Sub-task Open Unassigned

        Activity

          People

            Unassigned Unassigned
            dbatomic Aleksandar Tomic
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: