Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-14
Description
If they are top level or in collections, null values are printed as "NULL":
select int_array from functional_parquet.complextypestbl; +------------------------+ | int_array | +------------------------+ | [-1] | | [1,2,3] | | [NULL,1,2,NULL,3,NULL] | | [] | | NULL | | NULL | | NULL | | NULL | +------------------------+
If they are in a struct, they are printed as "null":
select small_struct from functional_parquet.complextypes_structs; +------------------------------------+ | small_struct | +------------------------------------+ | NULL | | {"i":19191,"s":"small_struct_str"} | | {"i":98765,"s":null} | | {"i":null,"s":"str"} | | {"i":98765,"s":"abcde f"} | | {"i":null,"s":null} | +------------------------------------+
In Hive the situation is a bit different: "NULL" is used only for top level values and "null" is printed in both collections and structs.
select int_array from functional_parquet.complextypestbl; +-------------------------+ | int_array | +-------------------------+ | [-1] | | [1,2,3] | | [null,1,2,null,3,null] | | [] | | NULL | | NULL | | NULL | | NULL | +-------------------------+
select small_struct from functional_parquet.complextypes_structs; +-------------------------------------+ | small_struct | +-------------------------------------+ | NULL | | {"i":19191,"s":"small_struct_str"} | | {"i":98765,"s":null} | | {"i":null,"s":"str"} | | {"i":98765,"s":"abcde f"} | | {"i":null,"s":null} | +-------------------------------------+
Officially we print collections and structs in JSON form. In JSON the relevant keyword is "null".
We should decide how we handle this situation.
- Have a uniform NULL representation everywhere: top level, collections and structs
- either "NULL" or "null" everywhere
- Have "NULL" on the top level and "null" in collections and structs, like Hive
- Leave everything as it is now: "NULL" at the top level and in collections, "null" in structs.