Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-9
Description
When querying a non-toplevel nested struct then the NULL values are displayed in an incorrect level. E.g.:
select id, outer_struct.inner_struct3 from functional_orc_def.complextypes_nested_structs where id >= 4;
+----+----------------------------+ | id | outer_struct.inner_struct3 | +----+----------------------------+ | 4 | {"s":{"i":null,"s":null}} | | 5 | {"s":null} | +----+----------------------------+
However, here in the first row the expected would be that 's' is null and not its members and in the second line the result should be 'NULL'.
For reference see what is returned when querying 'outer_struct' instead of 'outer_struct.inner_struct3':
+----+-------------------------------------------------------------------------------------------------------------------------------+ | 4 | {"str":"","inner_struct1":{"str":"somestr2","de":12345.12},"inner_struct2":{"i":1,"str":"string"},"inner_struct3":{"s":null}} | | 5 | {"str":null,"inner_struct1":null,"inner_struct2":null,"inner_struct3":null} | +----+-------------------------------------------------------------------------------------------------------------------------------+
Note, this issues is with ORC format.
After some digging I found that these incorrect null values are already present in the ORC scanner where OrcStructReader reads the rows in ReadValue() and ReadValueBatch() functions.
As a first step it would be nice to verify that the external ORC reader we use for reading the actual values from the files gives correct results.
Attachments
Issue Links
- depends upon
-
IMPALA-9495 Allow Struct type in SELECT list for ORC tables
- Closed