Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.10.0
-
None
Description
Drill string functions lower / upper / initcap work only for ASCII, but not for UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding to convert to Unicode characters. Without that encoding, these functions won't work for Cyrillic, Greek or any other character set with upper/lower distinctions.
Currently, when user applies these functions for UTF-8, Drill returns the same value as was given.
Example:
select upper('привет') from (values(1)) -> привет
There is disabled unit test in https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33 which should be enabled once issue is fixed.
Please note, by default Calcite does not allow to use UTF-8. Update system property saffron.default.charset to UTF-16LE if you encounter the following error:
org.apache.drill.exec.rpc.RpcException: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: CalciteException: Failed to encode 'привет' in character set 'ISO-8859-1'
Attachments
Issue Links
- incorporates
-
DRILL-6717 lower and upper functions not works with national charactes
- Resolved