Status: Open
Resolution: Unresolved
2.1.1, 3.0.0
explain select ss_item_sk, rank() over(order by cast(ss_list_price as decimal(38,10))) as r , ss_list_price from store_sales; STAGE PLANS: Stage: Stage-1 Tez DagId: root_20170608015140_7b0debb9-b14b-4150-b004-9743c6127392:3 Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) DagName: Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales Statistics: Num rows: 28800426268 Data size: 450435120648 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: 0 (type: int), CAST( ss_list_price AS decimal(38,10)) (type: decimal(38,10)) sort order: ++ Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 28800426268 Data size: 450435120648 Basic stats: COMPLETE Column stats: COMPLETE value expressions: ss_item_sk (type: bigint), ss_list_price (type: double) Execution mode: vectorized, llap LLAP IO: all inputs Reducer 2 Execution mode: llap Reduce Operator Tree: Select Operator expressions: VALUE._col1 (type: bigint), VALUE._col11 (type: double) outputColumnNames: _col1, _col11 Statistics: Num rows: 28800426268 Data size: 8399352770616 Basic stats: COMPLETE Column stats: COMPLETE PTF Operator Function definitions: Input definition input alias: ptf_0 output shape: _col1: bigint, _col11: double type: WINDOWING Windowing table definition input alias: ptf_1 name: windowingtablefunction order by: CAST( _col11 AS decimal(38,10)) ASC NULLS FIRST partition by: 0 raw input shape: window functions: window function definition alias: rank_window_0 arguments: CAST( _col11 AS decimal(38,10)) name: rank window function: GenericUDAFRankEvaluator window frame: PRECEDING(MAX)~FOLLOWING(MAX) isPivotResult: true Statistics: Num rows: 28800426268 Data size: 8399352770616 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col1 (type: bigint), rank_window_0 (type: int), _col11 (type: double) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 28800426268 Data size: 565636825720 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 28800426268 Data size: 565636825720 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
This forces the Decimal cast to be evaluated ~2x - once to produce the KEY expression and once within the window function.
Issue Links
- relates to
HIVE-16541 PTF: Avoid shuffling constant keys for empty OVER()
- Patch Available