Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16852

PTF: RANK() re-evaluates order predicates on the reducer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.1, 3.0.0
    • None
    • Physical Optimizer

    Description

      explain select ss_item_sk, rank() over(order by cast(ss_list_price as decimal(38,10))) as r , ss_list_price from store_sales;
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: root_20170608015140_7b0debb9-b14b-4150-b004-9743c6127392:3
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName:
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        Statistics: Num rows: 28800426268 Data size: 450435120648 Basic stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: 0 (type: int), CAST( ss_list_price AS decimal(38,10)) (type: decimal(38,10))
                          sort order: ++
                          Map-reduce partition columns: 0 (type: int)
                          Statistics: Num rows: 28800426268 Data size: 450435120648 Basic stats: COMPLETE Column stats: COMPLETE
                          value expressions: ss_item_sk (type: bigint), ss_list_price (type: double)
                  Execution mode: vectorized, llap
                  LLAP IO: all inputs
              Reducer 2 
                  Execution mode: llap
                  Reduce Operator Tree:
                    Select Operator
                      expressions: VALUE._col1 (type: bigint), VALUE._col11 (type: double)
                      outputColumnNames: _col1, _col11
                      Statistics: Num rows: 28800426268 Data size: 8399352770616 Basic stats: COMPLETE Column stats: COMPLETE
                      PTF Operator
                        Function definitions:
                            Input definition
                              input alias: ptf_0
                              output shape: _col1: bigint, _col11: double
                              type: WINDOWING
                            Windowing table definition
                              input alias: ptf_1
                              name: windowingtablefunction
                              order by: CAST( _col11 AS decimal(38,10)) ASC NULLS FIRST
                              partition by: 0
                              raw input shape:
                              window functions:
                                  window function definition
                                    alias: rank_window_0
                                    arguments: CAST( _col11 AS decimal(38,10))
                                    name: rank
                                    window function: GenericUDAFRankEvaluator
                                    window frame: PRECEDING(MAX)~FOLLOWING(MAX)
                                    isPivotResult: true
                        Statistics: Num rows: 28800426268 Data size: 8399352770616 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: _col1 (type: bigint), rank_window_0 (type: int), _col11 (type: double)
                          outputColumnNames: _col0, _col1, _col2
                          Statistics: Num rows: 28800426268 Data size: 565636825720 Basic stats: COMPLETE Column stats: COMPLETE
                          File Output Operator
                            compressed: false
                            Statistics: Num rows: 28800426268 Data size: 565636825720 Basic stats: COMPLETE Column stats: COMPLETE
                            table:
                                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      

      This forces the Decimal cast to be evaluated ~2x - once to produce the KEY expression and once within the window function.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: