Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4809

ReduceSinkOperator of PTFOperator can have redundant key columns

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.11.0, 0.12.0, 0.13.0, 0.14.0
    • 1.1.0
    • PTF-Windowing
    • None

    Description

      For example, we have a simple query like this ...

      SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
      

      The plan of it is ...

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 is a root stage
      
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Alias -> Map Operator Tree:
              x 
                TableScan
                  alias: x
                  Reduce Output Operator
                    key expressions:
                          expr: a
                          type: int
                          expr: a
                          type: int
                    sort order: ++
                    Map-reduce partition columns:
                          expr: a
                          type: int
                    tag: -1
                    value expressions:
                          expr: a
                          type: int
                          expr: b
                          type: string
            Reduce Operator Tree:
              Extract
                PTF Operator
                  Select Operator
                    expressions:
                          expr: _col0
                          type: int
                          expr: _col1
                          type: string
                          expr: _wcol0
                          type: bigint
                    outputColumnNames: _col0, _col1, _col2
                    File Output Operator
                      compressed: false
                      GlobalTableId: 0
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
      

      The ReduceSinkOperator has two "a" in its key columns. This redundancy can increase the size of map output.

      Attachments

        1. HIVE-4809.1.patch.txt
          104 kB
          Navis Ryu

        Issue Links

          Activity

            People

              navis Navis Ryu
              yhuai Yin Huai
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: