Details
-
Bug
-
Status: Reopened
-
Critical
-
Resolution: Unresolved
-
0.13.0
-
None
-
None
-
None
-
Windows Server 2008 R2
Description
When we put UTF-8 characters in where clause of a hive query the results are empty for "where content like '%丄%'" and results contain all rows for "where content not like '%丄%';" even when few rows contain this character.
Steps to reproduce:
1. Save a file called data.txt in the root container. The contents of the files are as follows.
190 丄f齄啊c狛䶴h䶴c狝
899 d狜狜㐁geg阿狚ea䶴eead狜e
137 齄鼾h狝ge㐀狛g狚阿
21 﨩﨩e㐀c狛鼾d䶴﨨
767 﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨
281 﨨㐀啊aga啊c狝e鼾鼾
573 㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄
966 䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄
565 䶵㐀﨩㐀bb狛ehd丄ea丄㐀
778 﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵
363 gd齄a鼾a䶴b㐁㐁fg鼾
822 a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b
338 b齄㐁ff阿e狜e㐀ba齄
2. Execute the following queries to setup the table.
a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '
t' LOCATION '/hivetable';
b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;
3. create a query file query.hql with following contents
INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
select * from hivetable where content like '%丄%';
4. even though few rows contains this character the output is empty.
5. change the contents of query.hql to
INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
select * from hivetable where content not like '%丄%';
6. The output contains all rows including those containing the given character.
7. Similar results are observed when using "where content = '丄f齄啊c狛䶴h䶴c狝'; "
8. We get expected results when using "where content like '%a%'; "
Attachments
Attachments
Issue Links
- is related to
-
HIVE-3245 UTF encoded data not displayed correctly by Hive driver
- Open
- links to