explain select * > from( > select i_category, i_class, i_brand, > s_store_name, s_company_name, > d_moy, > sum(ss_sales_price) sum_sales, > avg(sum(ss_sales_price)) over > (partition by i_category, i_brand, s_store_name, s_company_name) > avg_monthly_sales > from item, store_sales, date_dim, store > where store_sales.ss_item_sk = item.i_item_sk and > store_sales.ss_sold_date_sk = date_dim.d_date_sk and > store_sales.ss_store_sk = store.s_store_sk and > d_year in (2000) and > ((i_category in ('Home','Books','Electronics') and > i_class in ('wallpaper','parenting','musical') > ) > or (i_category in ('Shoes','Jewelry','Men') and > i_class in ('womens','birdal','pants') > )) > group by i_category, i_class, i_brand, > s_store_name, s_company_name, d_moy) tmp1 > where case when (avg_monthly_sales <> 0) then (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) else null end > 0.1 > order by sum_sales - avg_monthly_sales, s_store_name > limit 100;; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) DagName: rajesh_20150430153315_46a8a31a-7186-49d6-804a-d9ce77d26a64:1 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: (ss_store_sk is not null and ss_item_sk is not null) (type: boolean) Statistics: Num rows: 27503885621 Data size: 4507984789140 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (ss_store_sk is not null and ss_item_sk is not null) (type: boolean) Statistics: Num rows: 26856185846 Data size: 849276782864 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ss_item_sk (type: bigint), ss_store_sk (type: bigint), ss_sales_price (type: double), ss_sold_date_sk (type: bigint) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 26856185846 Data size: 849276782864 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col3 (type: bigint) 1 _col0 (type: bigint) outputColumnNames: _col0, _col1, _col2, _col6 input vertices: 1 Map 5 Statistics: Num rows: 30028020934 Data size: 840784586152 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: bigint) 1 _col0 (type: bigint) outputColumnNames: _col0, _col2, _col6, _col8, _col9 input vertices: 1 Map 6 Statistics: Num rows: 30028021760 Data size: 5975576330240 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: bigint) 1 _col0 (type: bigint) outputColumnNames: _col2, _col6, _col8, _col9, _col11, _col12, _col13 input vertices: 1 Map 7 Statistics: Num rows: 15014010880 Data size: 7101627146240 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col13 (type: string), _col12 (type: string), _col11 (type: string), _col8 (type: string), _col9 (type: string), _col6 (type: int), _col2 (type: double) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 Statistics: Num rows: 15014010880 Data size: 7101627146240 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: sum(_col6) keys: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int) mode: hash outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 Statistics: Num rows: 7507005440 Data size: 3550813573120 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int) sort order: ++++++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int) Statistics: Num rows: 7507005440 Data size: 3550813573120 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col6 (type: double) Execution mode: vectorized Map 5 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year) IN (2000) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 82034027 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((d_year) IN (2000) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 36524 Data size: 584384 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: bigint), d_moy (type: int) outputColumnNames: _col0, _col2 Statistics: Num rows: 36524 Data size: 438288 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: bigint) sort order: + Map-reduce partition columns: _col0 (type: bigint) Statistics: Num rows: 36524 Data size: 438288 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: int) Select Operator expressions: _col0 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 36524 Data size: 438288 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: bigint) mode: hash outputColumnNames: _col0 Statistics: Num rows: 18262 Data size: 219144 Basic stats: COMPLETE Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: store_sales Partition key expr: ss_sold_date_sk Statistics: Num rows: 18262 Data size: 219144 Basic stats: COMPLETE Column stats: COMPLETE Target column: ss_sold_date_sk Target Vertex: Map 1 Execution mode: vectorized Map 6 Map Operator Tree: TableScan alias: store filterExpr: s_store_sk is not null (type: boolean) Statistics: Num rows: 1500 Data size: 2889168 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: s_store_sk is not null (type: boolean) Statistics: Num rows: 1500 Data size: 280500 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: s_store_sk (type: bigint), s_store_name (type: string), s_company_name (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1500 Data size: 280500 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: bigint) sort order: + Map-reduce partition columns: _col0 (type: bigint) Statistics: Num rows: 1500 Data size: 280500 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 7 Map Operator Tree: TableScan alias: item filterExpr: ((((i_category) IN ('Home', 'Books', 'Electronics') and (i_class) IN ('wallpaper', 'parenting', 'musical')) or ((i_category) IN ('Shoes', 'Jewelry', 'Men') and (i_class) IN ('womens', 'birdal', 'pants'))) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 402000 Data size: 582462184 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((((i_category) IN ('Home', 'Books', 'Electronics') and (i_class) IN ('wallpaper', 'parenting', 'musical')) or ((i_category) IN ('Shoes', 'Jewelry', 'Men') and (i_class) IN ('womens', 'birdal', 'pants'))) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 201000 Data size: 58290000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: bigint), i_brand (type: string), i_class (type: string), i_category (type: string) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 201000 Data size: 58290000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: bigint) sort order: + Map-reduce partition columns: _col0 (type: bigint) Statistics: Num rows: 201000 Data size: 58290000 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string) Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: string), KEY._col5 (type: int) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 Statistics: Num rows: 1940040 Data size: 917638920 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string) sort order: ++++ Map-reduce partition columns: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string) Statistics: Num rows: 1940040 Data size: 917638920 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col5 (type: int), _col6 (type: double) Execution mode: vectorized Reducer 3 Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string), KEY.reducesinkkey1 (type: string), KEY.reducesinkkey2 (type: string), KEY.reducesinkkey3 (type: string), VALUE._col1 (type: int), VALUE._col2 (type: double) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 Statistics: Num rows: 1940040 Data size: 917638920 Basic stats: COMPLETE Column stats: COMPLETE PTF Operator Function definitions: Input definition input alias: ptf_0 output shape: _col0: string, _col1: string, _col2: string, _col3: string, _col4: string, _col5: int, _col6: double type: WINDOWING Windowing table definition input alias: ptf_1 name: windowingtablefunction order by: _col0, _col2, _col3, _col4 partition by: _col0, _col2, _col3, _col4 raw input shape: window functions: window function definition alias: avg_window_0 arguments: _col6 name: avg window function: GenericUDAFAverageEvaluatorDouble window frame: PRECEDING(MAX)~FOLLOWING(MAX) Statistics: Num rows: 1940040 Data size: 917638920 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (CASE WHEN ((avg_window_0 <> 0.0)) THEN ((abs((_col6 - avg_window_0)) / avg_window_0)) ELSE (null) END > 0.1) (type: boolean) Statistics: Num rows: 646680 Data size: 305879640 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int), _col6 (type: double), avg_window_0 (type: double) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 Statistics: Num rows: 646680 Data size: 311053080 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: (_col6 - _col7) (type: double), _col3 (type: string) sort order: ++ Statistics: Num rows: 646680 Data size: 311053080 Basic stats: COMPLETE Column stats: COMPLETE TopN Hash Memory Usage: 0.04 value expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col4 (type: string), _col5 (type: int), _col6 (type: double), _col7 (type: double) Reducer 4 Reduce Operator Tree: Select Operator expressions: VALUE._col0 (type: string), VALUE._col1 (type: string), VALUE._col2 (type: string), KEY.reducesinkkey1 (type: string), VALUE._col3 (type: string), VALUE._col4 (type: int), VALUE._col5 (type: double), VALUE._col6 (type: double) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 Statistics: Num rows: 646680 Data size: 311053080 Basic stats: COMPLETE Column stats: COMPLETE Limit Number of rows: 100 Statistics: Num rows: 100 Data size: 48100 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 100 Data size: 48100 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Stage: Stage-0 Fetch Operator limit: 100 Processor Tree: ListSink