Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
When accessing hive table with ucs2 encoded field, our implementation will return 0 rows.
This is caused by using of “strchr()”, see ExHdfsScanTcb::extractAndTransformAsciiSourceToSqlRow(),
strchr() returns at ‘\0’ before hit line delimiter ‘\n’, however the '\0' may just be a 0x00 part of ucs2 character, and the line is considered invalid.
Scripts to reproduce:
create table sck(
userId int not null,
name varchar(20) character set UCS2
);
insert into sck values (1001, _ucs2'JBL'), (1002, _ucs2'YS '), (1003, _ucs2'8#RTG');
unload into '/ucs2test' select * from sck;
create external table hsck
(
id int,
name string
) row format delimited fields terminated by '|'
location '/ucs2test';
select * from hive.hive.hsck;
Assigned to LaunchPad User khaled Bouaziz