Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
Impala 2.12.0
-
None
-
None
-
None
-
CentOS 7.3
Hive 1.2
Impala 2.12
Java JDK 1.8
Python 2.7.5
-
ghx-label-10
Description
UDF works in hive, but not in impala.
select leftcutcontentudf("一二三",2);
OK
一二
select leftcutcontentudf("一二三",2); +----------------------------------------+ | default.leftcutcontentudf('一二三', 2) | +----------------------------------------+ | ?? | +----------------------------------------+
chinese character changed to ?? in impala
I make a new UDF to print byte for input String
public class GetBytes extends UDF { public String evaluate(String input) { byte[] bytes = input.getBytes(); StringBuffer stringBuffer = new StringBuffer(); for (byte b : bytes){ stringBuffer.append(b).append(" "); } return stringBuffer.toString(); } }
it seems that the chinese character changed to ??? before calling UDF function.
select getbytes("一二三");
+-----------------------------+ | default.getbytes('一二三') | +-----------------------------+ | 63 63 63 63 63 63 63 63 63 | +-----------------------------+
but normal query is correct in impala.
select khmc_62c57e8ae0ac from collective_2085; +-------------------+ | khmc_62c57e8ae0ac | +-------------------+ | 淘宝 | +-------------------+
how to deal with this problem?
Attachments
Issue Links
- duplicates
-
IMPALA-2019 Proper UTF-8 support in string functions
- Resolved