Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4809

ReduceSinkOperator of PTFOperator can have redundant key columns

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0, 0.12.0, 0.13.0, 0.14.0
    • Fix Version/s: 1.1.0
    • Component/s: PTF-Windowing
    • Labels:
      None

      Description

      For example, we have a simple query like this ...

      SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
      

      The plan of it is ...

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 is a root stage
      
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Alias -> Map Operator Tree:
              x 
                TableScan
                  alias: x
                  Reduce Output Operator
                    key expressions:
                          expr: a
                          type: int
                          expr: a
                          type: int
                    sort order: ++
                    Map-reduce partition columns:
                          expr: a
                          type: int
                    tag: -1
                    value expressions:
                          expr: a
                          type: int
                          expr: b
                          type: string
            Reduce Operator Tree:
              Extract
                PTF Operator
                  Select Operator
                    expressions:
                          expr: _col0
                          type: int
                          expr: _col1
                          type: string
                          expr: _wcol0
                          type: bigint
                    outputColumnNames: _col0, _col1, _col2
                    File Output Operator
                      compressed: false
                      GlobalTableId: 0
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
      

      The ReduceSinkOperator has two "a" in its key columns. This redundancy can increase the size of map output.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                navis Navis
                Reporter:
                yhuai Yin Huai
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: