Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11067

Unify struct subexpressions in rows

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Frontend

    Description

      If a column is given multiple times in the select list, it is not duplicated under the hood in the row because we recognise that multiple columns in the result reference the same actual column, therefore the row size does not increase:

       

      explain select id, outer_struct from functional_orc_def.complextypes_nested_structs;
      Query: explain select id, outer_struct from functional_orc_def.complextypes_nested_structs
      +---------------------------------------------------------------+
      | Explain String                                                |
      +---------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
      | Per-Host Resource Estimates: Memory=20MB                      |
      | Codegen disabled by planner                                   |
      |                                                               |
      | PLAN-ROOT SINK                                                |
      | |                                                             |
      | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
      |    HDFS partitions=1/1 files=1 size=1.18KB                    |
      |    row-size=64B cardinality=5                                 |
      +---------------------------------------------------------------+
      

      With the id column duplicated:

       

      explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs;
      Query: explain select id, id, outer_struct from functional_orc_def.complextypes_nested_structs
      +---------------------------------------------------------------+
      | Explain String                                                |
      +---------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
      | Per-Host Resource Estimates: Memory=20MB                      |
      | Codegen disabled by planner                                   |
      |                                                               |
      | PLAN-ROOT SINK                                                |
      | |                                                             |
      | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
      |    HDFS partitions=1/1 files=1 size=1.18KB                    |
      |    row-size=64B cardinality=5                                 |
      +---------------------------------------------------------------+
      

      However, if we query a struct and a subfield of the same struct, we do not reuse the existing slot in the row but duplicate the subexpression, increasing the row size:

       

      explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs;
      Query: explain select id, outer_struct, outer_struct.inner_struct2 from functional_orc_def.complextypes_nested_structs
      +---------------------------------------------------------------+
      | Explain String                                                |
      +---------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=4.09MB Threads=2    |
      | Per-Host Resource Estimates: Memory=20MB                      |
      | Codegen disabled by planner                                   |
      |                                                               |
      | PLAN-ROOT SINK                                                |
      | |                                                             |
      | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
      |    HDFS partitions=1/1 files=1 size=1.18KB                    |
      |    row-size=80B cardinality=5                                 |
      +---------------------------------------------------------------+
      

       

       

      Attachments

        Issue Links

          Activity

            People

              daniel.becker Daniel Becker
              daniel.becker Daniel Becker
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: