Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-8
Description
If a column is given multiple times in the select list, it is not duplicated under the hood in the row because we recognise that multiple columns in the result reference the same actual column, therefore the row size does not increase:
explain select id, outer_struct from functional_orc_def.complextypes_nested_structs;
Query: explain select id, outer_struct from functional_orc_def.complextypes_nested_structs
+---------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.07MB Threads=2 |
| Per-Host Resource Estimates: Memory=20MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
| HDFS partitions=1/1 files=1 size=1.18KB |
| row-size=64B cardinality=5 |
+---------------------------------------------------------------+
With the id column duplicated:
explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs;
Query: explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs
+---------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.07MB Threads=2 |
| Per-Host Resource Estimates: Memory=20MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
| HDFS partitions=1/1 files=1 size=1.18KB |
| row-size=64B cardinality=5 |
+---------------------------------------------------------------+
However, if we query a struct and a subfield of the same struct, we do not reuse the existing slot in the row but duplicate the subexpression, increasing the row size:
explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs;
Query: explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs
+---------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.09MB Threads=2 |
| Per-Host Resource Estimates: Memory=20MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
| HDFS partitions=1/1 files=1 size=1.18KB |
| row-size=80B cardinality=5 |
+---------------------------------------------------------------+
Attachments
Issue Links
- is duplicated by
-
IMPALA-10929 Optimise memory usage of structs in tuples
- Resolved
- is fixed by
-
IMPALA-10838 Error when struct returned from WITH() and used in an ORDER BY
- Resolved