Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5477

String functions (lower, upper, initcap) should work for UTF-8

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.10.0
    • None
    • Functions - Drill

    Description

      Drill string functions lower / upper / initcap work only for ASCII, but not for UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding to convert to Unicode characters. Without that encoding, these functions won't work for Cyrillic, Greek or any other character set with upper/lower distinctions.

      Currently, when user applies these functions for UTF-8, Drill returns the same value as was given.
      Example:

      select upper('привет') from (values(1)) -> привет
      

      There is disabled unit test in https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33 which should be enabled once issue is fixed.

      Please note, by default Calcite does not allow to use UTF-8. Update system property saffron.default.charset to UTF-16LE if you encounter the following error:

      org.apache.drill.exec.rpc.RpcException: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: CalciteException: Failed to encode 'привет' in character set 'ISO-8859-1'
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              arina Arina Ielchiieva
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: