Hive
  1. Hive
  2. HIVE-4809

ReduceSinkOperator of PTFOperator can have redundant key columns

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0, 0.12.0, 0.13.0, 0.14.0
    • Fix Version/s: 1.1.0
    • Component/s: PTF-Windowing
    • Labels:
      None

      Description

      For example, we have a simple query like this ...

      SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
      

      The plan of it is ...

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 is a root stage
      
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Alias -> Map Operator Tree:
              x 
                TableScan
                  alias: x
                  Reduce Output Operator
                    key expressions:
                          expr: a
                          type: int
                          expr: a
                          type: int
                    sort order: ++
                    Map-reduce partition columns:
                          expr: a
                          type: int
                    tag: -1
                    value expressions:
                          expr: a
                          type: int
                          expr: b
                          type: string
            Reduce Operator Tree:
              Extract
                PTF Operator
                  Select Operator
                    expressions:
                          expr: _col0
                          type: int
                          expr: _col1
                          type: string
                          expr: _wcol0
                          type: bigint
                    outputColumnNames: _col0, _col1, _col2
                    File Output Operator
                      compressed: false
                      GlobalTableId: 0
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
      

      The ReduceSinkOperator has two "a" in its key columns. This redundancy can increase the size of map output.

        Issue Links

          Activity

          Brock Noland made changes -
          Fix Version/s 1.1.0 [ 12329363 ]
          Fix Version/s 0.15.0 [ 12328723 ]
          Ashutosh Chauhan made changes -
          Affects Version/s 0.14.0 [ 12326450 ]
          Affects Version/s 0.13.0 [ 12324986 ]
          Affects Version/s 0.12.0 [ 12324312 ]
          Ashutosh Chauhan made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.15.0 [ 12328723 ]
          Resolution Fixed [ 1 ]
          Navis made changes -
          Remote Link This issue links to "review board (Web Link)" [ 21993 ]
          Navis made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Yin Huai [ yhuai ] Navis [ navis ]
          Navis made changes -
          Attachment HIVE-4809.1.patch.txt [ 12692892 ]
          Ashutosh Chauhan made changes -
          Affects Version/s 0.11.0 [ 12323587 ]
          Ashutosh Chauhan made changes -
          Component/s PTF-Windowing [ 12320378 ]
          Yin Huai made changes -
          Field Original Value New Value
          Assignee Yin Huai [ yhuai ]
          Yin Huai created issue -

            People

            • Assignee:
              Navis
              Reporter:
              Yin Huai
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development