Hive
  1. Hive
  2. HIVE-4809

ReduceSinkOperator of PTFOperator can have redundant key columns

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0, 0.12.0, 0.13.0, 0.14.0
    • Fix Version/s: 1.1.0
    • Component/s: PTF-Windowing
    • Labels:
      None

      Description

      For example, we have a simple query like this ...

      SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
      

      The plan of it is ...

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 is a root stage
      
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Alias -> Map Operator Tree:
              x 
                TableScan
                  alias: x
                  Reduce Output Operator
                    key expressions:
                          expr: a
                          type: int
                          expr: a
                          type: int
                    sort order: ++
                    Map-reduce partition columns:
                          expr: a
                          type: int
                    tag: -1
                    value expressions:
                          expr: a
                          type: int
                          expr: b
                          type: string
            Reduce Operator Tree:
              Extract
                PTF Operator
                  Select Operator
                    expressions:
                          expr: _col0
                          type: int
                          expr: _col1
                          type: string
                          expr: _wcol0
                          type: bigint
                    outputColumnNames: _col0, _col1, _col2
                    File Output Operator
                      compressed: false
                      GlobalTableId: 0
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
      

      The ReduceSinkOperator has two "a" in its key columns. This redundancy can increase the size of map output.

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Navis
              Reporter:
              Yin Huai
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development