Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
Description
When loading/selecting from hive table, we treat both empty string and '\N' as null, this doesn't conform with hive behavior, in which, user can configure what null character is.
The issue can be explain in following steps:
1)
>select * from t1;
A B
----------- ------------------------------------------------------------------------------------------------------------------------
4
1 ?
2 fsdf
3 hhah
2)
unload with null_string '\N' into '/nulltest' select * from t1;
-bash-4.1$ swhdfs dfs -cat /nulltest/file0-20150410015053-828
4|
1|\N
2|fsdf
3|hhah
3)
hive> create external table nulltest ( A int, B string )
> row format delimited fields terminated by '|'
> location '/nulltest'
> tblproperties ('serialization.null.format' = '
N');
hive> select * from nulltest;
OK
nulltest.a nulltest.b
4 <-- empty string
1 NULL
2 fsdf
3 hhah
4)
>>select * from hive.hive.nulltest;
A B
----------- ------------------------------------------------------------------------------------------------------------------------
4 ? <-- empty string is considered null.
1 ?
2 fsdf
3 hhah
>select * from t1;
A B
----------- ------------------------------------------------------------------------------------------------------------------------
4 <-- should consist with t1 and hive.
1 ?
2 fsdf
3 hhah
Assigned to LaunchPad User khaled Bouaziz